Selection of variables and dimension reduction in high-dimensional non-parametric regression

نویسندگان

  • Karine Bertin
  • Guillaume Lecué
چکیده

We consider a l1-penalization procedure in the non-parametric Gaussian regression model. In many concrete examples, the dimension d of the input variable X is very large (sometimes depending on the number of observations). Estimation of a β-regular regression function f cannot be faster than the slow rate n−2β/(2β+d) . Hopefully, in some situations, f depends only on a few numbers of the coordinates of X . In this paper, we construct two procedures. The first one selects, with high probability, these coordinates. Then, using this subset selection method, we run a local polynomial estimator (on the set of interesting coordinates) to estimate the regression function at the rate n−2β/(2β+d ∗ ), where d, the “real” dimension of the problem (exact number of variables whom f depends on), has replaced the dimension d of the design. To achieve this result, we used a l1 penalization method in this non-parametric setup. AMS 2000 subject classifications: Primary 62G08.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-parametric Bayesian simultaneous dimension reduction and regression on manifolds

We formulate a Bayesian non-parametric model for simultaneous dimension reduction and regression as well as inference of graphical models. The proposed model holds for both the classical setting of Euclidean subspaces and the Riemannian setting where the marginal distribution is concentrated on a manifold. The method is designed for the high-dimensional setting where the number of variables far...

متن کامل

Nonparametric variable selection and dimension reduction methods and their applications in pharmacogenomics

Zhu, Jingyi Ph.D., Purdue University, December 2014. Nonparametric Variable Selection and Dimension Reduction Methods and Their Applications in Pharmacogenomics . Major Professor: Jun Xie. Nowadays it is common to collect large volumes of data in many fields with an extensive amount of variables, but often a small or moderate number of samples. For example, in the analysis of genomic data, the ...

متن کامل

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

Tight conditions for consistent variable selection in high dimensional nonparametric regression

We address the issue of variable selection in the regression model with very high ambient dimension, i.e., when the number of covariates is very large. The main focus is on the situation where the number of relevant covariates, called intrinsic dimension, is much smaller than the ambient dimension. Without assuming any parametric form of the underlying regression function, we get tight conditio...

متن کامل

Comparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data

Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008