Reducing Pervasive False-Positive Identical-by-Descent Segments Detected by Large-Scale Pedigree Analysis
نویسندگان
چکیده
Analysis of genomic segments shared identical-by-descent (IBD) between individuals is fundamental to many genetic applications, from demographic inference to estimating the heritability of diseases, but IBD detection accuracy in nonsimulated data is largely unknown. In principle, it can be evaluated using known pedigrees, as IBD segments are by definition inherited without recombination down a family tree. We extracted 25,432 genotyped European individuals containing 2,952 father-mother-child trios from the 23andMe, Inc. data set. We then used GERMLINE, a widely used IBD detection method, to detect IBD segments within this cohort. Exploiting known familial relationships, we identified a false-positive rate over 67% for 2-4 centiMorgan (cM) segments, in sharp contrast with accuracies reported in simulated data at these sizes. Nearly all false positives arose from the allowance of haplotype switch errors when detecting IBD, a necessity for retrieving long (>6 cM) segments in the presence of imperfect phasing. We introduce HaploScore, a novel, computationally efficient metric that scores IBD segments proportional to the number of switch errors they contain. Applying HaploScore filtering to the IBD data at a precision of 0.8 produced a 13-fold increase in recall when compared with length-based filtering. We replicate the false IBD findings and demonstrate the generalizability of HaploScore to alternative data sources using an independent cohort of 555 European individuals from the 1000 Genomes project. HaploScore can improve the accuracy of segments reported by any IBD detection method, provided that estimates of the genotyping error rate and switch error rate are available.
منابع مشابه
Numerical Studies of Size Distribution of Unconditional and Conditional Transmitted and Identical-by-Descent DNA Segments in Pedigrees in Multi-Crossover Models
Extended the study of the size distribution of transmitted DNA segments from an ancestor's haploid genome and the size distribution of identical-by-descent segments with respect to that haploid genome between any two related individuals in a pedigree in [Li, Zhang, Marr, 1994], where the single-crossover model is used, we study numerically the similar size distributions in models with more than...
متن کاملAnalysis of data on related individuals through inference of identity by descent
Pedigrees exist within populations. While close pedigree relationships may be known, in any genetic epidemiological study there are likely also relationships among pedigrees. Our ultimate goal is to combine information from within-pedigree gene descent and betweenpedigree genome sharing into a single analysis. Data from SNP genotype assays, in which 300,000 or more SNP variants may be typed acr...
متن کاملMapping of QTL in a commercial sheep pedigree
The confirmation of the segregation of experimentally discovered quantitative trait loci (QTL) in a variety of commercial populations is required before their commercial significance can be fully realized. The use of complex pedigrees in the design of such confirmation experiments has the potential to increase the probability of the QTL segregating within the pedigree while maintaining the powe...
متن کاملAn Accurate Method for Inferring Relatedness in Large Datasets of Unphased Genotypes via an Embedded Likelihood-Ratio Test
Studies that map disease genes rely on accurate annotations that indicate whether individuals in the studied cohorts are related to each other or not. For example, in genome-wide association studies, the cohort members are assumed to be unrelated to one another. Investigators can correct for individuals in a cohort with previously-unknown shared familial descent by detecting genomic segments th...
متن کاملA note on algorithms for genotype and allele elimination in complex pedigrees with incomplete genotype data.
Elimination of genotypes or alleles for each individual or meiosis, which are inconsistent with observed genotypes, is a component of various genetic analyses of complex pedigrees. Computational efficiency of the elimination algorithm is critical in some applications such as genotype sampling via descent graph Markov chains. We present an allele elimination algorithm and two genotype eliminatio...
متن کامل