Using LASSO regression to detect predictive aggregate effects in genetic studies

نویسندگان

  • Joel B Fontanarosa
  • Yang Dai
چکیده

We use least absolute shrinkage and selection operator (LASSO) regression to select genetic markers and phenotypic features that are most informative with respect to a trait of interest. We compare several strategies for applying LASSO methods in risk prediction models, using the Genetic Analysis Workshop 17 exome simulation data consisting of 697 individuals with information on genotypic and phenotypic features (smoking, age, sex) in 5-fold cross-validated fashion. The cross-validated averages of the area under the receiver operating curve range from 0.45 to 0.63 for different strategies using only genotypic markers. The same values are improved to 0.69-0.87 when both genotypic and phenotypic information are used. The ability of the LASSO method to find true causal markers is limited, but the method was able to discover several common variants (e.g., FLT1) under certain conditions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Novel/Alternative statistical approaches for genetic association studies

Introduction: Single SNP analyses using logistic regression have traditionally been applied in genetic association studies to assess main effects. However, many complex diseases, such as bladder cancer (BC), are likely to be associated with the combined effects of multiple loci. Problems derived from genetic data are that most single SNP analyses are underpowered to detect small effects and so ...

متن کامل

Penalized Lasso Methods in Health Data: application to trauma and influenza data of Kerman

Background: Two main issues that challenge model building are number of Events Per Variable and multicollinearity among exploratory variables. Our aim is to review statistical methods that tackle these issues with emphasize on penalized Lasso regression model.  The present study aimed to explain problems of traditional regressions due to small sample size and m...

متن کامل

Identification of Genetic Polymorphism Interactions in Sporadic Alzheimer’s Disease Using Logic Regression

Objectives: Genetic polymorphism interactions are among the important factors in affliction with complex diseases like Alzheimer’s disease. The important goal of genetic association studies is to identify a combination of polymorphisms and measure their importance in increasing the risk of occurrence of such diseases. In this study, feature selection approach of logic regression was used to ide...

متن کامل

Risk assessment models for genetic risk predictors of lung cancer using two-stage replication for Asian and European populations

In the past ten years, great successes have been accumulated by taking advantage of both candidate-gene studies and genome-wide association studies. However, limited studies were available to systematically evaluate the genetic effects for lung cancer risk with large-scale and different ethnic populations. We systematically reviewed relevant literatures and filtered out 241 important genetic va...

متن کامل

Smooth-Threshold Multivariate Genetic Prediction with Unbiased Model Selection.

We develop a new genetic prediction method, smooth-threshold multivariate genetic prediction, using single nucleotide polymorphisms (SNPs) data in genome-wide association studies (GWASs). Our method consists of two stages. At the first stage, unlike the usual discontinuous SNP screening as used in the gene score method, our method continuously screens SNPs based on the output from standard univ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2011