Variable Selection and Model Building via Likelihood Basis Pursuit
نویسندگان
چکیده
Abstract This paper presents a nonparametric penalized likelihood approach for variable selection and model building, called likelihood basis pursuit (LBP). In the setting of a tensor product reproducing kernel Hilbert space, we decompose the log likelihood into the sum of different functional components such as main effects and interactions, with each component represented by appropriate basis functions. Basis functions are chosen to be compatible with variable selection and model building in the context of a smoothing spline ANOVA model. Basis pursuit is applied to obtain the optimal decomposition in terms of having the smallest l1 norm on the coefficients. We use the functional L1 norm to measure the importance of each component and determine the “threshold” value by a sequential Monte Carlo bootstrap test algorithm. As a generalized LASSOtype method, LBP produces shrinkage estimates for the coefficients, which greatly facilitates the variable selection process, and provides highly interpretable multivariate functional estimates at the same time. To choose the regularization parameters appearing in the LBP models, generalized approximate cross validation (GACV) is derived as a tuning criterion. To make GACV widely applicable to large data sets, its randomized version is proposed as well. A technique “slice modeling” is used to solve the optimization problem and makes the computation more efficient. LBP has great potential for a wide range of research and application areas such as medical studies, and in this paper we apply it to two large on-going epidemiologic studies: the Wisconsin Epidemiologic Study of Diabetic Retinopathy (WESDR) and the Beaver Dam Eye Study (BDES).
منابع مشابه
Smoothing Spline ANOVA Models II. Variable Selection and Model Building via Likelihood Basis Pursuit
We describe Likelihood Basis Pursuit, a nonparametric method for variable selection and model building, based on merging ideas from Lasso and Basis Pursuit works and from smoothing spline ANOVA models. An application to nonparametric variable selection for risk factor modeling in the Wisconsin Epidemiological Study of Diabetic Retinopathy is described. Although there are many approaches to vari...
متن کاملA Variety of Regularization Problems
Beginning with a review of some optimization problems in RKHS, and going on to a model selection problem via Likelihood Basis Pursuit (LBP).
متن کاملVariable Selection via Basis Pursuit for Non-Gaussian Data
A simultaneous flexible variable selection procedure is proposed by applying a basis pursuit method to the likelihood function. The basis functions are chosen to be compatible with variable selection in the context of smoothing spline ANOVA models. Since it is a generalized LASSO-type method, it enjoys the favorable property of shrinking coefficients and gives interpretable results. We derive a...
متن کاملModel building with likelihood basis pursuit
We consider a non-parametric penalized likelihood approach for model building called likelihood basis pursuit (LBP) that determines the probabilities of binary outcomes given explanatory vectors while automatically selecting important features. The LBP model involves parameters that balance the competing goals of maximizing the log-likelihood and minimizing the penalized basis pursuit terms. Th...
متن کاملPenalized Bregman Divergence Estimation via Coordinate Descent
Variable selection via penalized estimation is appealing for dimension reduction. For penalized linear regression, Efron, et al. (2004) introduced the LARS algorithm. Recently, the coordinate descent (CD) algorithm was developed by Friedman, et al. (2007) for penalized linear regression and penalized logistic regression and was shown to gain computational superiority. This paper explores...
متن کامل