Tracing Evolutionary Links between Species

نویسنده

  • Mike A. Steel
چکیده

The idea that all life on earth traces back to a common beginning dates back at least to Charles Darwin’s Origin of Species. Ever since, biologists have tried to piece together parts of this ‘tree of life’ based on what we can observe today: fossils, and the evolutionary signal that is present in the genomes and phenotypes of different organisms. Mathematics has played a key role in helping transform genetic data into phylogenetic (evolutionary) trees and networks. Here, I will explain some of the central concepts and basic results in phylogenetics, which benefit from several branches of mathematics, including combinatorics, probability and algebra. 1 What is phylogenetics? All living organisms on earth harbor within their DNA a signature of their evolutionary heritage. By studying patterns and differences between the genetic makeup of different species, molecular biologists are able to piece together parts of the story of how life today traces back a common origin. In this way, many basic questions can be answered. When did animals and plants diverge? Are fungi more closely related to plants or animals? How and when did photosynthesis arise? What is the closest living animal to the whales? Does speciation occur in bursts or at a steady rate? Other topics are proving more difficult to resolve – for example, deciphering the earliest history of life on earth. Similar questions arise for evolutionary processes in other fields such as epidemiology (e.g. the relationship between different strains of influenza or HIV) and linguistics (e.g. how languages diverged from one another over time). In all these fields, the analysis relies on an underlying mathematical theory, grounded in combinatorics, algebra, and stochastic processes, with the concept of an evolutionary tree as a unifying object. In this article, I describe a cross-section of some of the key concepts in ‘phylogenetics’, which is the theory of reconstructing and analyzing trees and networks from data observed at the present. I describe some combinatorial features of phylogenetic trees, namely their encoding by set systems, their enumeration, their generation under random models of evolution, and the way in which they can ‘perfectly’ display discrete data. I then focus on tree reconstruction from data (discrete or distance-based), which may not perfectly fit a tree. Such imperfect data can occur when data ‘evolve’ along the branches of the tree under a random Markov model. I end by outlining how tree reconstruction is possible from this evolved data, but the choice of method requires care, to avoid falling into a ‘zone’ of statistical inconsistency. 2 Hierarchies and phylogenetic trees. The 18th century Swedish taxonomist Carl Linneaus noticed that much of the living world can be nicely organised into a ‘hierarchy’ in which groups of living organisms are either disjoint or nested [26]. For example, cats and dogs comprise disjoint classes of organisms, but both are subsets of the class of mammals. Formally, a hierarchy H on a finite set X is a collection of subsets of X with the property that any two elements of H are either nested (one is contained in the other) or disjoint. It will also be convenient here to require that 1 ar X iv :1 40 2. 37 71 v1 [ qbi o. PE ] 1 6 Fe b 20 14 any hierarchy on X contains the set X and all its singleton subsets. Thus H forms a hierarchy if it satisfies the two properties: H1: For any two sets A,B ∈ H we have A ∩B ∈ {A,B, ∅}; and H2: H contains the entire set X, and each singleton set {x} for all x ∈ X. The second condition is harmless: if H is any collection of sets that satisfies H1, we can always add the extra elements mentioned by H2 without violating H1. To connect hierarchies with trees, recall first that a tree T is a connected graph (V,E) with no cycles. Often we will deal with rooted trees for which the edges are all directed away from some root vertex, and so each vertex has an ‘in-degree’ and ‘out-degree’. We first define a rooted phylogenetic X-tree to be a tree T in which: • X is the set of leaves (vertices of out-degree 0); • all the arcs (directed edges) are directed away from some root vertex ρ; • every non-leaf vertex has out-degree at least 2. (a) (b) mushroom cat daisy rice bacteria mushroom cat daisy rice

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Evolutionary Relationship Between Stearoyl-CoA Desaturase (SCD) Protein Sequences Involved in Fatty Acid Metabolism

Background: Stearoyl-CoA desaturase (SCD) is a key enzyme that converts saturated fatty acids (SFAs) to monounsaturated fatty acids (MUFAs) in fat biosynthesis. Despite being crucial for interpreting SCDs’ roles across species, the evolutionary relationship of SCD proteins across species has yet to be elucidated. This study aims to present this evolutionary relationship based on amino aci...

متن کامل

Connect the dots: exposing hidden protein family connections from the entire sequence tree

MOTIVATION Mapping of remote evolutionary links is a classic computational problem of much interest. Relating protein families allows for functional and structural inference on uncharacterized families. Since sequences have diverged beyond reliable alignment, these are too remote to identify by conventional methods. APPROACH We present a method to systematically identify remote evolutionary r...

متن کامل

An Evolutionary and Phylogenetic Study of the BMP15 Gene

DNA sequence data contains a wealth of biologically useful information. Recent innovations in DNA sequencing technology have greatly increased our capacity to determine massive amounts of nucleotide sequences. These sequences can be used to specify the characteristics of different regions, interpret the evolutionary relationships between categorized groups, likelihood of performing multiple com...

متن کامل

Origins and diversification of a complex signal transduction system in prokaryotes.

The molecular machinery that controls chemotaxis in bacteria is substantially more complex than any other signal transduction system in prokaryotes, and its origins and variability among living species are unknown. We found that this multiprotein "chemotaxis system" is present in most prokaryotic species and evolved from simpler two-component regulatory systems that control prokaryotic transcri...

متن کامل

Modelling Food Webs

We review theoretical approaches to the understanding of food webs. After an overview of the available food web data, we discuss three different classes of models. The first class comprise static models, which assign links between species according to some simple rule. The second class are dynamical models, which include the population dynamics of several interacting species. We focus on the qu...

متن کامل

A preliminary study on phylogenetic relationship between five sturgeon species in the Iranian Coastline of the Caspian Sea

The phylogenetic relationship of five sturgeon species in the South Caspian Sea was investigated using mtDNA molecule. Sequence analysis of mtDNA D-loop region of five sturgeon species [Great sturgeon (Huso huso), Russian sturgeon (Acipenser gueldenstaedtii), Persian sturgeon (Acipenser persicus), Ship sturgeon (Acipenser nudiventris), Stellate sturgeon (Acipenser stellatus)] and DNA sequencing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • The American Mathematical Monthly

دوره 121  شماره 

صفحات  -

تاریخ انتشار 2014