High-dimensional data and the Lasso
نویسنده
چکیده
How would you try to solve a linear system of equations with more unknowns than equations? Of course, there are infinitely many solutions, and yet this is the sort of the problem statisticians face with many modern datasets, arising in genetics, imaging, finance and many other fields. What’s worse, our equations are often corrupted by noisy measurements! In this article we will introduce a statistical method that has been at the centre of the huge amount of research that has gone into solving these problems. We’ll begin by reviewing the classical version of the problems, before moving on to the more modern setting hinted at above.
منابع مشابه
Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملMammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease
Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...
متن کاملSome Properties of Generalized Fused Lasso and Its Applications to High Dimensional Data
Identifying homogeneous subgroups of variables can be challenging in high dimensional data analysis with highly correlated predictors. The generalized fused lasso has been proposed to simultaneously select correlated variables and identify them as predictive clusters. In this article, we study several properties of generalized fused lasso. First, we present a geometric interpretation of the gen...
متن کاملPenalized Lasso Methods in Health Data: application to trauma and influenza data of Kerman
Background: Two main issues that challenge model building are number of Events Per Variable and multicollinearity among exploratory variables. Our aim is to review statistical methods that tackle these issues with emphasize on penalized Lasso regression model. The present study aimed to explain problems of traditional regressions due to small sample size and m...
متن کاملPenalized Estimators in Cox Regression Model
The proportional hazard Cox regression models play a key role in analyzing censored survival data. We use penalized methods in high dimensional scenarios to achieve more efficient models. This article reviews the penalized Cox regression for some frequently used penalty functions. Analysis of medical data namely ”mgus2” confirms the penalized Cox regression performs better than the cox regressi...
متن کاملPenalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification
An important application of DNAmicroarray data is cancer classification. Because of the high-dimensionality problem of microarray data, gene selection approaches are often employed to support the expert systems in diagnostic capability of cancer with high classification accuracy. Penalized logistic regression using the least absolute shrinkage and selection operator (LASSO) is one of the key st...
متن کامل