A novel method for estimating ancestral amino acid composition and its application to proteins of the Last Universal Ancestor
نویسندگان
چکیده
MOTIVATION Knowledge of how proteomic amino acid composition has changed over time is important for constructing realistic models of protein evolution and increasing our understanding of molecular evolutionary history. The proteomic amino acid composition of the Last Universal Ancestor (LUA) of life is of particular interest, since that might provide insight into the early evolution of proteins and the nature of the LUA itself. RESULTS We introduce a method to estimate ancestral amino acid composition that is based on expectation-maximization. On simulated data, the approach was found to be very effective in estimating ancestral amino acid composition, with accuracy improving as the number of residues in the dataset was increased. The method was then used to infer the amino acid composition of a set of proteins in the LUA. In general, as compared with the modern protein set, LUA proteins were found to be richer in amino acids that are believed to have been most abundant in the prebiotic environment and poorer in those believed to have been unavailable or scarce. Additionally, we found the inferred amino acid composition of this protein set in the LUA to be more similar to the observed composition of the same set in extant thermophilic species than in extant mesophilic species, supporting the idea that the LUA lived in a thermophilic environment. AVAILABILITY The program is available at http://compbio.cs.princeton.edu/ancestralaa
منابع مشابه
A novel method for estimating ancestral amino acid composition and its application to proteins of the Last Universal
Motivation: Knowledge of how proteomic amino acid composition has changed over
متن کاملEvolution of amino acid frequencies in proteins over deep time: inferred order of introduction of amino acids into the genetic code.
To understand more fully how amino acid composition of proteins has changed over the course of evolution, a method has been developed for estimating the composition of proteins in an ancestral genome. Estimates are based upon the composition of conserved residues in descendant sequences and empirical knowledge of the relative probability of conservation of various amino acids. Simulations are u...
متن کاملThe molecular signal for the adaptation to cold temperature during early life on Earth
Several lines of evidence such as the basal location of thermophilic lineages in large-scale phylogenetic trees and the ancestral sequence reconstruction of single enzymes or large protein concatenations support the conclusion that the ancestors of the bacterial and archaeal domains were thermophilic organisms which were adapted to hot environments during the early stages of the Earth. A parsim...
متن کاملA Universal Trend among Proteomes Indicates an Oily Last Common Ancestor
Despite progresses in ancestral protein sequence reconstruction, much needs to be unraveled about the nature of the putative last common ancestral proteome that served as the prototype of all extant lifeforms. Here, we present data that indicate a steady decline (oil escape) in proteome hydrophobicity over species evolvedness (node number) evident in 272 diverse proteomes, which indicates a hig...
متن کاملA Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences
Most models of nucleotide or amino acid substitution used in phylogenetic studies assume that the evolutionary process has been homogeneous across lineages and that composition of nucleotides or amino acids has remained the same throughout the tree. These oversimplified assumptions are refuted by the observation that compositional variability characterizes extant biological sequences. Branch-he...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 20 14 شماره
صفحات -
تاریخ انتشار 2004