Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Authors
Abstract:
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variables. In addition, classical methods are affected by the presence of outliers and collinearity. Methods: Nowadays, many real-world data sets carry structures of high-dimensional problems. To handle this problem, we used the least absolute shrinkage and selection operator (LASSO). Also, due to the flexibility and applicability of the semiparametric model in medical data, it can be used for modeling the genomic data. Motivated by these, here an improved robust approach in a high-dimensional data set was developed for the analysis of gene expression and prediction in the presence of outliers. Results: Among the common problems in regression analysis, there was the problem of outliers. In the regression concept, an outlier is a point that fails to follow the main linear pattern of the data. The ordinary least-squares estimator was found potentially sensitive to the outliers; this fact provided necessary motivations to investigate robust estimations. Generally, the robust regression is among the most popular problems in the statistics community. In the present study, the least trimmed squares (LTS) estimation was applied to overcome the outlier problem. Conclusions: We have proposed an optimization approach for semiparametric models to combat outliers in the data set. Especially, based on a penalization LASSO scheme, we have suggested a nonlinear integer programming problem as the semiparametric model which can be effectively solved by any evolutionary algorithm. We have also studied a real-world application related to the riboflavin production. The results showed that the proposed method was reasonably efficient in contrast to the LTS Method.
similar resources
Robust Ridge Regression for High-Dimensional Data
Ridge regression, being based on the minimization of a quadratic loss function, is sensitive to outliers. Current proposals for robust ridge regression estimators are sensitive to bad leverage observations, cannot be employed when the number of predictors p is larger than the number of observations n; and have a low robustness when the ratio p=n is large. In this paper a ridge regression esti...
full textSemiparametric Quantile Regression with High-dimensional Covariates.
This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mil...
full textMethods for regression analysis in high-dimensional data
By evolving science, knowledge and technology, new and precise methods for measuring, collecting and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by ...
full textRobust High-Dimensional Linear Regression
The effectiveness of supervised learning techniques has made them ubiquitous in research and practice. In high-dimensional settings, supervised learning commonly relies on dimensionality reduction to improve performance and identify the most important factors in predicting outcomes. However, the economic importance of learning has made it a natural target for adversarial manipulation of trainin...
full textA robust method for ultra-high dimensional regression analysis
To increase the estimation accuracy and reduce the computational cost in ultrahigh dimensional regression analysis, ? proposed Sure Independence Screening (SIS) which selects a subset of the variables before estimating the regression coefficients. Predictor variables are selected according to the magnitude of their marginal correlations with the response variable. ? proved that SIS shares the S...
full textOn robust regression with high-dimensional predictors.
We study regression M-estimates in the setting where p, the number of covariates, and n, the number of observations, are both large, but p ≤ n. We find an exact stochastic representation for the distribution of β = argmin(β∈ℝ(p)) Σ(i=1)(n) ρ(Y(i) - X(i')β) at fixed p and n under various assumptions on the objective function ρ and our statistical model. A scalar random variable whose determinist...
full textMy Resources
Journal title
volume 8 issue 2
pages 9- 22
publication date 2020-06
By following a journal you will be notified via email when a new issue of this journal is published.
No Keywords
Hosted on Doprax cloud platform doprax.com
copyright © 2015-2023