Regression in High - Dimensional Sparse Models

نویسندگان

  • ALEXANDRE BELLONI
  • VICTOR CHERNOZHUKOV
چکیده

We consider median regression and, more generally, a possibly infinite collection of quantile regressions in high-dimensional sparse models. In these models, the number of regressors p is very large, possibly larger than the sample size n, but only at most s regressors have a nonzero impact on each conditional quantile of the response variable, where s grows more slowly than n. Since ordinary quantile regression is not consistent in this case, we consider 1-penalized quantile regression ( 1-QR), which penalizes the 1-norm of regression coefficients, as well as the post-penalized QR estimator (post1QR), which applies ordinary QR to the model selected by 1-QR. First, we show that under general conditions 1-QR is consistent at the near-oracle rate √ s/n √ log(p ∨ n), uniformly in the compact set U ⊂ (0,1) of quantile indices. In deriving this result, we propose a partly pivotal, data-driven choice of the penalty level and show that it satisfies the requirements for achieving this rate. Second, we show that under similar conditions post1-QR is consistent at the near-oracle rate √ s/n √ log(p ∨ n), uniformly over U , even if the 1-QR-selected models miss some components of the true models, and the rate could be even closer to the oracle rate otherwise. Third, we characterize conditions under which 1-QR contains the true model as a submodel, and derive bounds on the dimension of the selected model, uniformly over U ; we also provide conditions under which hard-thresholding selects the minimal true model, uniformly over U .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Estimation in Linear Regression with Molticollinearity and Sparse Models

‎One of the factors affecting the statistical analysis of the data is the presence of outliers‎. ‎The methods which are not affected by the outliers are called robust methods‎. ‎Robust regression methods are robust estimation methods of regression model parameters in the presence of outliers‎. ‎Besides outliers‎, ‎the linear dependency of regressor variables‎, ‎which is called multicollinearity...

متن کامل

Mammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease

Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...

متن کامل

Sparse Regularization for High Dimensional Additive Models

We study the behavior of the l1 type of regularization for high dimensional additive models. Our results suggest remarkable similarities and differences between linear regression and additive models in high dimensional settings. In particular, our analysis indicates that, unlike in linear regression, l1 regularization does not yield optimal estimation for additive models of high dimensionality....

متن کامل

Bayesian Factor Regression Models in the “Large p, Small n” Paradigm

I discuss Bayesian factor regression models and prediction with very many explanatory variables. Such problems arise in many areas; my motivating applications are in studies of gene expression in functional genomics. I first discuss empirical factor (principal components) regression, and the use of general classes of shrinkage priors, with an example. These models raise foundational questions f...

متن کامل

Two-sample testing in high dimensions

We propose new methodology for two-sample testing in high dimensional models. The methodology provides a high dimensional analogue to the classical likelihood ratio test and is applicable to essentially any model class where sparse estimation is feasible.Sparse structure is used in the construction of the test statistic. In the general case, testing then involves nonnested model comparison, and...

متن کامل

SpAM: Sparse Additive Models

We present a new class of models for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive a method for fitting the models that is effective even when the number of covariates is larger than the sample size. A statistical analysis of the properties ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011