Selection of variables and dimension reduction in high-dimensional non-parametric regression
نویسندگان
چکیده
We consider a l1-penalization procedure in the non-parametric Gaussian regression model. In many concrete examples, the dimension d of the input variable X is very large (sometimes depending on the number of observations). Estimation of a β-regular regression function f cannot be faster than the slow rate n−2β/(2β+d) . Hopefully, in some situations, f depends only on a few numbers of the coordinates of X . In this paper, we construct two procedures. The first one selects, with high probability, these coordinates. Then, using this subset selection method, we run a local polynomial estimator (on the set of interesting coordinates) to estimate the regression function at the rate n−2β/(2β+d ∗ ), where d, the “real” dimension of the problem (exact number of variables whom f depends on), has replaced the dimension d of the design. To achieve this result, we used a l1 penalization method in this non-parametric setup. AMS 2000 subject classifications: Primary 62G08.
منابع مشابه
Non-parametric Bayesian simultaneous dimension reduction and regression on manifolds
We formulate a Bayesian non-parametric model for simultaneous dimension reduction and regression as well as inference of graphical models. The proposed model holds for both the classical setting of Euclidean subspaces and the Riemannian setting where the marginal distribution is concentrated on a manifold. The method is designed for the high-dimensional setting where the number of variables far...
متن کاملNonparametric variable selection and dimension reduction methods and their applications in pharmacogenomics
Zhu, Jingyi Ph.D., Purdue University, December 2014. Nonparametric Variable Selection and Dimension Reduction Methods and Their Applications in Pharmacogenomics . Major Professor: Jun Xie. Nowadays it is common to collect large volumes of data in many fields with an extensive amount of variables, but often a small or moderate number of samples. For example, in the analysis of genomic data, the ...
متن کاملRobust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملTight conditions for consistent variable selection in high dimensional nonparametric regression
We address the issue of variable selection in the regression model with very high ambient dimension, i.e., when the number of covariates is very large. The main focus is on the situation where the number of relevant covariates, called intrinsic dimension, is much smaller than the ambient dimension. Without assuming any parametric form of the underlying regression function, we get tight conditio...
متن کاملComparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data
Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008