Geometric representation of high dimension, low sample size data

نویسندگان

  • Peter Hall
  • J. S. Marron
  • Amnon Neeman
چکیده

High dimension, low sample size data are emerging in various areas of science. We find a common structure underlying many such data sets by using a non-standard type of asymptotics: the dimension tends to 1 while the sample size is fixed. Our analysis shows a tendency for the data to lie deterministically at the vertices of a regular simplex. Essentially all the randomness in the data appears only as a random rotation of this simplex. This geometric representation is used to obtain several new statistical insights.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pca Consistency in High Dimension , Low Sample Size Context

Principal Component Analysis (PCA) is an important tool of dimension reduction especially when the dimension (or the number of variables) is very high. Asymptotic studies where the sample size is fixed, and the dimension grows (i.e. High Dimension, Low Sample Size (HDLSS)) are becoming increasingly relevant. We investigate the asymptotic behavior of the Principal Component (PC) directions. HDLS...

متن کامل

Boundary behavior in High Dimension, Low Sample Size asymptotics of PCA

In High Dimension, Low Sample Size (HDLSS) data situations, where the dimension d is much larger than the sample size n, principal component analysis (PCA) plays an important role in statistical analysis. Under which conditions does the sample PCA well reflect the population covariance structure? We answer this question in a relevant asymptotic context where d grows and n is fixed, under a gene...

متن کامل

Largest Eigenvalue Estimation for High-Dimension, Low-Sample-Size Data and its Application

A common feature of high-dimensional data is the data dimension is high, however, the sample size is relatively low. We call such data HDLSS data. In this paper, we study HDLSS asymptotics when the data dimension is high while the sample size is fixed. We first introduce two eigenvalue estimation methods: the noise-reduction (NR) methodology and the cross-data-matrix (CDM) methodology. We show ...

متن کامل

Asymptotics for High Dimension, Low Sample Size data and Analysis of Data on Manifolds

SUNGKYU JUNG: Asymptotics for High Dimension, Low Sample Size data and Analysis of Data on Manifolds. (Under the direction of Dr. J. S. Marron.) The dissertation consists of two research topics regarding modern non-standard data analytic situations. In particular, data under the High Dimension, Low Sample Size (HDLSS) situation and data lying on manifolds are analyzed. These situations are rela...

متن کامل

Robust Centroid Quantile Based ClassiÞcation for High Dimension Low Sample Size Data

A new method of statistical classiÞcation (discrimination) is proposed. The method is most effective for high dimension low sample size data. Its value is demonstrated through a new type of asymptotic analysis, and via a simulation study.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004