Algorithms for inferring haplotypes.

نویسنده

  • Tianhua Niu
چکیده

Haplotype phase information in diploid organisms provides valuable information on human evolutionary history and may lead to the development of more efficient strategies to identify genetic variants that increase susceptibility to human diseases. Molecular haplotyping methods are labor-intensive, low-throughput, and very costly. Therefore, algorithms based on formal statistical theories were shown to be very effective and cost-efficient for haplotype reconstruction. This review covers 1) population-based haplotype inference methods: Clark's algorithm, expectation-maximization (EM) algorithm, coalescence-based algorithms (pseudo-Gibbs sampler and perfect/imperfect phylogeny), and partition-ligation algorithm implemented by a fully Bayesian model (Haplotyper) or by EM (PLEM); 2) family-based haplotype inference methods; 3) the handling of genotype scoring uncertainties (i.e., genotyping errors and raw two-dimensional genotype scatterplots) in inferring haplotypes; and 4) haplotype inference methods for pooled DNA samples. The advantages and limitations of each algorithm are discussed. By using simulations based on empirical data on the G6PD gene and TNFRSF5 gene, I demonstrate that different algorithms have different degrees of sensitivity to various extents of population diversities and genotyping error rates. Future development of statistical algorithms for addressing haplotype reconstruction will resort more and more to ideas based on combinatorial mathematics, graphical models, and machine learning, and they will have profound impacts on population genetics and genetic epidemiology with the advent of the human HapMap.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Computation of Minimum Recombination with genotypes (not Haplotypes)

A current major focus in genomics is the large-scale collection of genotype data in populations in order to detect variations in the population. The variation data are sought in order to address fundamental and applied questions in genetics that concern the haplotypes in the population. Since, almost all the collected data is in the form of genotypes, but the downstream genetics questions conce...

متن کامل

PedPhase: Haplotype Inference for Pedigree Data

Summary: We have developed a computer program consisting of four algorithms for inferring haplotypes from (unphased) genotypes on pedigree data. These algorithms are designed based on a combinatorial formulation of haplotype inference, namely the minimum-recombinant haplotype configuration (MRHC) problem, and are effective for different types of data. One of the algorithms, called block-extensi...

متن کامل

Recovering haplotype structure through recombination and gene conversion

MOTIVATION Understanding haplotype evolution subject to mutation, recombination and gene conversion is fundamental to understand genetic specificities of human populations and hereditary bases of complex disorders. The goal of this project is to develop new algorithmic tools assisting the reconstruction of historical relationships between haplotypes and the inference of haplotypes from genotype...

متن کامل

Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.

Although many algorithms exist for estimating haplotypes from genotype data, none of them take full account of both the decay of linkage disequilibrium (LD) with distance and the order and spacing of genotyped markers. Here, we describe an algorithm that does take these factors into account, using a flexible model for the decay of LD with distance that can handle both "blocklike" and "nonblockl...

متن کامل

Inferring combined CNV/SNP haplotypes from genotype data

MOTIVATION Copy number variations (CNVs) are increasingly recognized as an substantial source of individual genetic variation, and hence there is a growing interest in investigating the evolutionary history of CNVs as well as their impact on complex disease susceptibility. CNV/SNP haplotypes are critical for this research, but although many methods have been proposed for inferring integer copy ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genetic epidemiology

دوره 27 4  شماره 

صفحات  -

تاریخ انتشار 2004