Boosting for High-multivariate Responses in High-dimensional Linear Regression
نویسندگان
چکیده
We propose a boosting method, multivariate L2Boosting, for multivariate linear regression based on some squared error loss for multivariate data. It can be applied to multivariate linear regression with continuous responses and to vector autoregressive time series. We prove, for i.i.d. as well as time series data, that multivariate L2Boosting can consistently recover sparse high-dimensional multivariate linear functions, even when the number of predictor variables pn and the dimension of the response qn grow almost exponentially with sample size n, pn = qn = O(exp(Cn )) (0 < ξ < 1, 0 < C < ∞), but the `1-norm of the true underlying function is finite. Our theory seems to be among the first to address the issue of large dimension of the response variable; the relevance of such settings is briefly outlined. We also identify empirically some cases where our multivariate L2Boosting is better than multiple application of univariate methods to single response components, thus demonstrating that the multivariate approach can be very useful.
منابع مشابه
Multivariate Boosting for Integrative Analysis of High-Dimensional Cancer Genomic Data
In this paper, we propose a novel multivariate component-wise boosting method for fitting multivariate response regression models under the high-dimension, low sample size setting. Our method is motivated by modeling the association among different biological molecules based on multiple types of high-dimensional genomic data. Particularly, we are interested in two applications: studying the inf...
متن کاملRobust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملPrioritization sub-watershed of Acemangar Basin in Chaharmahal-e- Bakhtiari for soil and water management using morphometric parameters and ensemble of TOPSIS-multivariate linear regression algorithm
Sub-watershed prioritization is very important in natural resources and watershed management. This study deals with prioritization of sub-watersheds using a mixed multivariate linear model of New TOPSIS-Regression over morphometric parameters of 11 sub-watersheds. Morphometric parameters include constant of compression ratio, roundness factor, form ratio, slenderness ratio,channel maintenance, ...
متن کاملMethods for regression analysis in high-dimensional data
By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...
متن کاملRegularization for generalized additive mixed models by likelihood-based boosting.
OBJECTIVE With the emergence of semi- and nonparametric regression the generalized linear mixed model has been extended to account for additive predictors. However, available fitting methods fail in high dimensional settings where many explanatory variables are present. We extend the concept of boosting to generalized additive mixed models and present an appropriate algorithm that uses two diff...
متن کامل