Single versus multiple imputation for genotypic data
نویسندگان
چکیده
Due to the growing need to combine data across multiple studies and to impute untyped markers based on a reference sample, several analytical tools for imputation and analysis of missing genotypes have been developed. Current imputation methods rely on single imputation, which ignores the variation in estimation due to imputation. An alternative to single imputation is multiple imputation. In this paper, we assess the variation in imputation by completing both single and multiple imputations of genotypic data using MACH, a commonly used hidden Markov model imputation method. Using data from the North American Rheumatoid Arthritis Consortium genome-wide study, the use of single and multiple imputation was assessed in four regions of chromosome 1 with varying levels of linkage disequilibrium and association signals. Two scenarios for missing genotypic data were assessed: imputation of untyped markers and combination of genotypic data from two studies. This limited study involving four regions indicates that, contrary to expectations, multiple imputations may not be necessary.
منابع مشابه
کاربرد جای گذاری چندگانه در تحقیقات پزشکی و اپیدمیولوژی
Data missing, which occurs for different reasons, is an unavoidable problem in epidemiological studies. It is quite widespread and, therefore, it is considered as a challenge in research design and data analysis by many methodologists. Complete case analysis is often used in studies with missing data however, this approach may result in inaccurate estimates and inferences due to bias associated...
متن کاملSelection of Variables that Influence Drug Injection in Prison: Comparison of Methods with Multiple Imputed Data Sets
Background: Prisoners, compared to the general population, are at greater risk of infection. Drug injection is the main route of HIV transmission, in particular in Iran. What would be of interest is to determine variables that govern drug injection among prisoners. However, one of the issues that challenge model building is incomplete national data sets. In this paper, we addressed the process ...
متن کاملچند رویکرد برخورد با مقادیر گمشده متغیرهای کمی و بررسی اثر آنها بر نتایج حاصل از یک کارآزمایی بالینی
Background and Objectives: A major challenge that affects the longitudinal studies is the problem of missing data. Missing in the data may result in the loss of part of the information which reduces the accuracy of the estimator and obtain the results will be biased and inaccurate. Therefore, it is necessary to evaluate the missing data mechanism from a longitudinal research and to consider thi...
متن کاملAccuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)
Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...
متن کاملGenome-Wide Association Study Based on Multiple Imputation with Low-Depth Sequencing Data: Application to Biofuel Traits in Reed Canarygrass
Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., Phalaris caesia Nees), a cool-season grass species with potential as a biofuel crop. Our study inv...
متن کامل