Tutorial on Phylogenetic Tree Estimation
نویسندگان
چکیده
1 Tutorial Summary All biological disciplines are united by the idea that species share a common history. The genealogical history of life-also called an \evolutionary tree"-is usually represented by a bifurcating, leaf-labeled tree. The use of evolutionary trees is a fundamental step in many biological problems, such as multiple sequence alignments, protein structure and function prediction, and drug design. The primary scientiic objective of phylogenetic studies is not to solve a given optimization problem, but rather to recover the order of speciation or gene duplication events represented by the topology of the true evolutionary tree. (Locating the root of the evolutionary tree is a scientiically diicult task, so that a method is considered to have been successful if it recovers the topology of the unrooted tree.) This means that good or poor performance with respect to optimization problems is only important to the degree that it guarantees good or poor performance with respect to topology estimation. Unfortunately, inferring evolutionary trees is an enormously diicult problem for several reasons. For one, the phylogeny problem is a diicult statistical problem because its parameter space has a complicated structure, and there is nòoo the shelf' solution to the phylogeny problem that can be applied. The phylogeny problem also presents a considerable computational challenge. Typical data sets now consist of several hundred species, and presently available tree reconstruction methods are inadequate to the task of analyzing such datasets. For example, an rbcL DNA sequence data set of 500 plants has been analyzed for several years now, without solution. The explanation for why these analyses are so diicult is simple: the optimization problems are NP-hard, and the heuristics used in an attempt to solve these optimization problems use hill-climbing techniques to search through an exponentially large space of phylogenetic trees. Statistical approaches towards phylogeny reconstruction have modeled the evolutionary process stochasti-cally, and have studied the performance of methods for recovering phylogenetic trees in terms of the accuracy of these methods on datasets of nite length sequences generated under diierent model trees. These studies have shown that some methods recover the true tree topology with high probability, once the sequences are long enough, while other methods have no such guarantees. Over the last decade or so, computer scientists have also begun to design and analyze the performance of phylogenetic methods under these statistical models. One of the results of this interest in using statistical models of …
منابع مشابه
A tutorial of BPP for species tree estimation and species delimitation
This paper provides an overview of the BPP program, which is a Bayesian MCMC program for analysis of multi-locus genomic sequence data under the multispecies coalescent model. An example dataset of five nuclear loci from the East Asian brown frogs is used to illustrate four different analyses, including estimation of parameters under the multispecies coalescent model on a fixed species phylogen...
متن کاملThe BPP program for species tree estimation and species delimitation
This paper provides an overview and a tutorial of the BPP program, which is a Bayesian MCMC program for analyzing multi-locus genomic sequence data under the multispecies coalescent model. An example dataset of five nuclear loci from the East Asian brown frogs is used to illustrate four different analyses, including estimation of species divergence time and population size parameters under the ...
متن کاملQuantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species
Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...
متن کاملA comparative phylogenetic analysis of Theileria spp. by using two two "18S ribosomal RNA" and "Theileria annulata merozoite surface antigen" gene sequences
More than 185 species, strains and unclassified Theileria parasites are categorized in the Entrez Taxonomy. The accurate diagnosis and proper identification of the causative agents are important for understanding the epidemiology, prevention and appropriate treatment. This study aims to discuss the importance of two genes of Theileria annulata 18S ribosomal RNA (18S rRNA) and Theileria annulata...
متن کاملQuantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species
Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...
متن کامل