Simultaneous set-wise testing under dependence, with applications to genome-wide association studies
نویسندگان
چکیده
We consider the problem of identifying diseaseassociated genomic regions in genome-wide association studies (GWAS). It is shown that conventional single SNP analysis can be greatly improved by (i) exploiting the spatial dependency and (ii) conducing set-wise analysis. The SNP set association problem can be conceptualized as the problem of simultaneously testing a large number of sets of hypotheses. We use hidden Markov models to exploit the linkage disequilibrium information in GWAS data, based on which a data-driven screening procedure (GLIS) is proposed. GLIS is shown to be optimal in the sense that it has the smallest missed set rate (MSR) among all valid false set rate (FSR) procedures. The numerical results demonstrate that the proposed procedure controls the FSR at the desired level, enjoys certain optimality properties and outperforms conventional combined p-value methods. We apply the GLIS procedure to analyze a Type 1 diabetes (T1D) GWAS dataset for detecting T1D associated genomic regions. The results show that our proposed SNP set analysis not only provides better biological insights, but also increases the statistical power by pooling information from different samples.
منابع مشابه
Genome-wide Association Study to Identify Genes and Biological Pathways Associated with Type Traits in Cattle using Pathway Analysis
Extended Abstract Introduction and Objective: Type traits describing the skeletal characteristics of an animal are moderately to strongly genetically correlate with other economically important traits in cattle including fertility, longevity and carcass traits. The present study aimed to conduct a genome wide association studies (GWAS) based on gene-set enrichment analysis for identifying the ...
متن کاملGraphical-model Based Multiple Testing under Dependence, with Applications to Genome-wide Association Studies
Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We propose a multiple testing procedure which is based on a Markov-random-field-coupled mixture model. T...
متن کاملOptimal High Dimensional Multiple Testing Under Linear Models
High dimensional multiple testing has many important applications. Motivated by genome-wide association studies (GWAS), we consider the problem of mulitiple testing under high dimensional sparse linear model in order to identify the genetic markers associated with the trait of interest. The model is an extension of the normal mixture model under arbitrary dependence. We propose a multiple testi...
متن کاملEstimating False Discovery Proportion Under Arbitrary Covariance Dependence.
Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any SNPs are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging u...
متن کاملStatistical Methods for Genome-wide Association Studies and Personalized Medicine
In genome-wide association studies (GWAS), researchers analyze the genetic variation across the entire human genome, searching for variations that are associated with observable traits or certain diseases. There are several inference challenges in GWAS, including the huge number of genetic markers to test, the weak association between truly associated markers and the traits, and the correlation...
متن کامل