Association Studies with Imputed Variants Using Expectation-Maximization Likelihood-Ratio Tests

نویسندگان

  • Kuan-Chieh Huang
  • Wei Sun
  • Ying Wu
  • Mengjie Chen
  • Karen L. Mohlke
  • Leslie A. Lange
  • Yun Li
چکیده

Genotype imputation has become standard practice in modern genetic studies. As sequencing-based reference panels continue to grow, increasingly more markers are being well or better imputed but at the same time, even more markers with relatively low minor allele frequency are being imputed with low imputation quality. Here, we propose new methods that incorporate imputation uncertainty for downstream association analysis, with improved power and/or computational efficiency. We consider two scenarios: I) when posterior probabilities of all potential genotypes are estimated; and II) when only the one-dimensional summary statistic, imputed dosage, is available. For scenario I, we have developed an expectation-maximization likelihood-ratio test for association based on posterior probabilities. When only imputed dosages are available (scenario II), we first sample the genotype probabilities from its posterior distribution given the dosages, and then apply the EM-LRT on the sampled probabilities. Our simulations show that type I error of the proposed EM-LRT methods under both scenarios are protected. Compared with existing methods, EM-LRT-Prob (for scenario I) offers optimal statistical power across a wide spectrum of MAF and imputation quality. EM-LRT-Dose (for scenario II) achieves a similar level of statistical power as EM-LRT-Prob and, outperforms the standard Dosage method, especially for markers with relatively low MAF or imputation quality. Applications to two real data sets, the Cebu Longitudinal Health and Nutrition Survey study and the Women's Health Initiative Study, provide further support to the validity and efficiency of our proposed methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ml Estimation of Mean and Covariance Structures with Missing Data Using Complete Data Routines

We consider maximum likelihood (ML) estimation of mean and covariance structure models when data are missing. Expectation maximization (EM), generalized expectation maximization (GEM), Fletcher-Powell, and Fisherscoring algorithms are described for parameter estimation. It is shown how the machinery within a software that handles the complete data problem can be utilized to implement each algor...

متن کامل

Analysis of Location and Dispersion Effects Based on Censored Data from Unreplicated Experiments

A distinctive feature of Japanese quality improvement techniques is the use of statistical experimentations to find the best combination of process variables, which minimizes variance and appropriate control the mean level. In testing durable products for reliability improvement, observations are usually censored or grouped. The incompleteness of <;lata couples with complicated structure of scr...

متن کامل

Comparison of Methods of Handling Missing Data: A Case Study of KDHS 2010 Data

Missing data poses a major threat to observational and experimental studies. Analysis of data having ignored missingness results to estimates that are inefficient and unbiased. Various researches have been done to determine the best methods of dealing with missing data. The analysis used in these researches involved simulating missing data from complete data. Missing data are then imputed using...

متن کامل

Detecting Rare Variants in Case-Parents Association Studies

Despite the success of genome-wide association studies (GWASs) in detecting common variants (minor allele frequency ≥0.05) many suggested that rare variants also contribute to the genetic architecture of diseases. Recently, researchers demonstrated that rare variants can show a strong stratification which may not be corrected by using existing methods. In this paper, we focus on a case-parents ...

متن کامل

Monte Carlo State-Space Likelihoods by Weighted Posterior Kernel Density Estimation

Maximum likelihood estimation and likelihood ratio tests for nonlinear, non-Gaussian state-space models require numerical integration for likelihood calculations. Several methods, including Monte Carlo (MC) expectation maximization, MC likelihood ratios, direct MC integration, and particle Ž lter likelihoods, are inefŽ cient for the motivating problem of stage-structured population dynamics mod...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014