A direct approach to sparse discriminant analysis in ultra-high dimensions

نویسندگان

  • QING MAI
  • HUI ZOU
  • MING YUAN
چکیده

Sparse discriminant methods based on independence rules, such as the nearest shrunken centroids classifier (Tibshirani et al., 2002) and features annealed independence rules (Fan & Fan, 2008), have been proposed as computationally attractive tools for feature selection and classification with high-dimensional data. A fundamental drawback of these rules is that they ignore correlations among features and thus could produce misleading feature selection and inferior classification. We propose a new procedure for sparse discriminant analysis, motivated by the least squares formulation of linear discriminant analysis. To demonstrate our proposal, we study the numerical and theoretical properties of discriminant analysis constructed via lasso penalized least squares. Our theory shows that the method proposed can consistently identify the subset of discriminative features contributing to the Bayes rule and at the same time consistently estimate the Bayes classification direction, even when the dimension can grow faster than any polynomial order of the sample size. The theory allows for general dependence among features. Simulated and real data examples show that lassoed discriminant analysis compares favourably with other popular sparse discriminant proposals.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semiparametric Sparse Discriminant Analysis in Ultra-High Dimensions

In recent years, a considerable amount of work has been devoted to generalizing linear discriminant analysis to overcome its incompetence for high-dimensional classification (Witten & Tibshirani 2011, Cai & Liu 2011, Mai et al. 2012, Fan et al. 2012). In this paper, we develop high-dimensional semiparametric sparse discriminant analysis (HD-SeSDA) that generalizes the normal-theory discriminant...

متن کامل

Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures

We consider the problem of clustering data points in high dimensions, i.e., when the number of data points may be much smaller than the number of dimensions. Specifically, we consider a Gaussian mixture model (GMM) with two non-spherical Gaussian components, where the clusters are distinguished by only a few relevant dimensions. The method we propose is a combination of a recent approach for le...

متن کامل

Sparse semiparametric discriminant analysis

In recent years, a considerable amount of work has been devoted to generalizing linear discriminant analysis to overcome its incompetence for high-dimensional classification (Witten and Tibshirani, 2011, Cai and Liu, 2011, Mai et al., 2012 and Fan et al., 2012). In this paper, we develop high-dimensional sparse semiparametric discriminant analysis (SSDA) that generalizes the normal-theory discr...

متن کامل

Pinpointing the classifiers of English language writing ability: A discriminant function analysis approach

The  major  aim  of  this  paper  was  to  investigate  the  validity  of  language  and intelligence  factors  for  classifying  Iranian  English  learners`  writing  performance. Iranian  participants  of  the  study  took  three  tests  for  grammar,  breadth,  and  depth  of vocabulary, and two tests for verbal and narrative intelligence. They also produced a corpus  of  argumentative  writ...

متن کامل

A Note On the Connection and Equivalence of Three Sparse Linear Discriminant Analysis Methods

In this paper we reveal the connection and equivalence of three sparse linear discriminant analysis methods: the `1-Fisher’s discriminant analysis proposed in Wu et al. (2008), the sparse optimal scoring proposed in Clemmensen et al. (2011) and the direct sparse discriminant analysis proposed in Mai et al. (2012). It is shown that, for any sequence of penalization parameters, the normalized sol...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012