Parsimony and likelihood reconstruction of human segmental duplications
نویسندگان
چکیده
MOTIVATION Segmental duplications > 1 kb in length with >or= 90% sequence identity between copies comprise nearly 5% of the human genome. They are frequently found in large, contiguous regions known as duplication blocks that can contain mosaic patterns of thousands of segmental duplications. Reconstructing the evolutionary history of these complex genomic regions is a non-trivial, but important task. RESULTS We introduce parsimony and likelihood techniques to analyze the evolutionary relationships between duplication blocks. Both techniques rely on a generic model of duplication in which long, contiguous substrings are copied and reinserted over large physical distances, allowing for a duplication block to be constructed by aggregating substrings of other blocks. For the likelihood method, we give an efficient dynamic programming algorithm to compute the weighted ensemble of all duplication scenarios that account for the construction of a duplication block. Using this ensemble, we derive the probabilities of various duplication scenarios. We formalize the task of reconstructing the evolutionary history of segmental duplications as an optimization problem on the space of directed acyclic graphs. We use a simulated annealing heuristic to solve the problem for a set of segmental duplications in the human genome in both parsimony and likelihood settings. AVAILABILITY Supplementary information is available at http://www.cs.brown.edu/people/braphael/supplements/.
منابع مشابه
Optimizing Directed Acyclic Graphs via Simulated Annealing for Reconstructing Human Segmental Duplications
Segmental duplications, relatively long and nearly identical regions, prevalent in the mammalian genome, are successfully modeled by directed acyclic graphs. Reconstructing the evolutionary history of these genomic regions is a non-trivial, but important task, as segmental duplications harbor recent primate-specific and human-specific innovations and also mediate copy number variation within th...
متن کاملA Parsimony Approach to Analysis of Human Segmental Duplications
Segmental duplications are abundant in the human genome, but their evolutionary history is not well-understood. The mystery surrounding them is due in part to their complex organization; many segmental duplications are mosaic patterns of smaller repeated segments, or duplicons. A two-step model of duplication has been proposed to explain these mosaic patterns. In this model, duplicons are copie...
متن کاملEfficient Algorithms for Analyzing Segmental Duplications, Deletions, and Inversions in Genomes
Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics consisting of pieces of multiple other segmental duplications. This complex genomic organization complicates analysis of the evolutionary history of these sequences. Earlier, we introduced a genomic distance, called duplication distance, that computes the mo...
متن کاملSegmental Duplications as a Complement Strategy to Short Tandem Repeats in the Prenatal Diagnosis of Down Syndrome
Background: Quantitative fluorescence-polymerase chain reaction (QF-PCR) is an inexpensive and accurate method for the prenatal diagnosis of aneuploidies that applies short tandem repeats (STRs) as a chromosome-specific marker. Despite its apparent advantages, QF-PCR is not applicable in all cases due to the presence of uninformative STRs. This study was carried out to investigate the efficienc...
متن کاملEnrichment of segmental duplications in regions of breaks of synteny between the human and mouse genomes suggest their involvement in evolutionary rearrangements.
The sequence of the mouse genome allows one to compare the conservation of synteny between the human and mouse genome and exploration of regions that might have been involved in major rearrangements during the evolution of these two species (evolutionary genome rearrangements). Recent segmental duplications (or duplicons) are paralogous DNA sequences with high sequence identity that account for...
متن کامل