Large-scale Feature Selection of Risk Genetic Factors for Alzheimer's Disease via Distributed Group Lasso Regression
نویسندگان
چکیده
Genome-wide association studies (GWAS) have achieved great success in the genetic study of Alzheimer’s disease (AD). Collaborative imaging genetics studies across different research institutions show the effectiveness of detecting genetic risk factors. However, the high dimensionality of GWAS data poses significant challenges in detecting risk SNPs for AD. Selecting relevant features is crucial in predicting the response variable. In this study, we propose a novel Distributed Feature Selection Framework (DFSF) to conduct the large-scale imaging genetics studies across multiple institutions. To speed up the learning process, we propose a family of distributed group Lasso screening rules to identify irrelevant features and remove them from the optimization. Then we select the relevant group features by performing the group Lasso feature selection process in a sequence of parameters. Finally, we employ the stability selection to rank the top risk SNPs that might help detect the early stage of AD. To the best of our knowledge, this is the first distributed feature selection model integrated with group Lasso feature selection as well as detecting the risk genetic factors across multiple research institutions system. Empirical studies are conducted on 809 subjects with 5.9 million SNPs which are distributed across several individual institutions, demonstrating the efficiency and effectiveness of the proposed method.
منابع مشابه
Identifying Genetic Risk Factors via Sparse Group Lasso with Group Graph Structure
Genome-wide association studies (GWA studies or GWAS) investigate the relationships between genetic variants such as single-nucleotide polymorphisms (SNPs) and individual traits. Recently, incorporating biological priors together with machine learning methods in GWA studies has attracted increasing attention. However, in real-world, nucleotide-level bio-priors have not been well-studied to date...
متن کاملIdentification of Genetic Polymorphism Interactions in Sporadic Alzheimer’s Disease Using Logic Regression
Objectives: Genetic polymorphism interactions are among the important factors in affliction with complex diseases like Alzheimer’s disease. The important goal of genetic association studies is to identify a combination of polymorphisms and measure their importance in increasing the risk of occurrence of such diseases. In this study, feature selection approach of logic regression was used to ide...
متن کاملSelection of models for the analysis of risk-factor trees: leveraging biological knowledge to mine large sets of risk factors with application to microbiome data
MOTIVATION Establishment of a statistical association between microbiome features and clinical outcomes is of growing interest because of the potential for yielding insights into biological mechanisms and pathogenesis. Extracting microbiome features that are relevant for a disease is challenging and existing variable selection methods are limited due to large number of risk factor variables fro...
متن کاملThe role of genetics in alzheimer’s disease
Alzheimer's disease is a progressive neurological disorder that causes the brain to shrink (atrophy) and brain cells die. Alzheimer's disease is the most common cause of dementia and causes a decrease in thinking skills and social behaviors. Alzheimer's disease is more common in people over 65 years old. The risk of developing Alzheimer's disease and other types of dementia increases with age,...
متن کاملGenome-wide Multiple Loci Mapping in Experimental Crosses by the Iterative Adaptive Penalized Regression
Genome-wide multiple loci mapping can be viewed as a variable selection problem where the major objective is to select genetic markers related with a trait of interest. This is a challenging variable selection problem because the number of genetic markers is large (often much larger than the sample size) and there are often strong linkage or linkage disequilibrium between markers. In this paper...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1704.08383 شماره
صفحات -
تاریخ انتشار 2017