Simulating sequences and phylogenetic trees

ثبت نشده
چکیده

Using the seqgen() function available in the phyclust package [1], data were simulated by mutating DNA sequences along phylogenetic trees. The topology of the trees were generated at two stages via the ms program [1, 2]. At the first stage we randomly generated a root sequence and a star-like ancestor tree having equal branch lengths and tips equal to the number of desired clusters. The tip label ag denotes the ancestor node, or recent common ancestor, of the gth cluster. The second stage generates a descendant tree for each cluster which is rooted at ag. The number of tips in the gth descendant tree represents the number of members in cluster g. The complete tree is obtained by attaching the G descendant trees to the tips of the ancestor tree. For instance, Figure S1 shows the complete tree (right panel) created by binding six randomly generating descendent trees (example in middle panel) to the corresponding tips on the ancestor tree (left panel). Standard phylogenetic trees typically involve bifurcating branches, i.e., exactly two diverging sequences per node. It may be the case, however, that the rapid diversification of HIV is better modeled using multifurcating trees [3]. For this reason, our simulations were created using star-like ancestor trees which take on a “pitchfork” structure (cf. Figure S1). In our context, this topology indicates that patients have simultaneously (or very rapidly) diverged from a common root [4]. The height of the ancestor tree, denoted by rA, determines the diversity between the ancestor node sequences, while the height of the descendant trees, rD, determines the diversity between sequences belonging to the same group. The proportion of the ancestor to descendent height is determined by the growth rate ratio rAD = rA/(rA + rD). Herein, complete trees are scaled to have total height one; consequently, rAD = rA. Based on the topology of the tree, we generated DNA sequences with seqgen(). Sequences were mutated according to a GTR model which assumes that sites evolve according to random variable R ∼ Γ(α, β) (parameterized such that E[R] = α/β). To avoid too many parameters, it is commonplace to assume a Gamma distribution parameterized by a single shape parameter α, where α = β. In addition, a proportion of sites (p) are assumed to be invariable. Together, these specifications are referred to as the GTR + I + Γ model. In accordance with [5] we set these parameters to values typical for HIV-1 Subtype B on the pol gene; namely, we set α = 0.7589, the proportion of invariant sites to p = 0.4817 and specify the rate matrix to:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species

Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...

متن کامل

Quantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species

Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...

متن کامل

Phylogenetic and sequence analysis of the growth hormone gene of two sturgeons, Huso huso and Acipenser Gueldenstaedtii

In this study, the cDNA Growth Hormone (cGH) of the Belugasturgeon (Husohuso) and Russian sturgeon (Acipensergueldenstaedtii) were cloned and sequenced, and phylogenetic relationships were examined using nucleic acid and amino acid sequences. The nucleotide sequence of the Beluga GH has an open reading frame of 645 nucleotides encoding a protein 214 amino acid residues. The signal peptide cleav...

متن کامل

Phylogenetic Analysis of Three Long Non-coding RNA Genes: AK082072, AK043754 and AK082467

Now, it is clear that protein is just one of the most functional products produced by the eukaryotic genome. Indeed, a major part of the human genome is transcribed to non-coding sequences than to the coding sequence of the protein. In this study, we selected three long non-coding RNAs namely AK082072, AK043754 and AK082467 which show brain expression and local region conservation among vertebr...

متن کامل

Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction

The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaf-labeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges i...

متن کامل

Study on phylogenetic status of Hari barbel Luciobarbus conocephalus (Kessler, 1872) from Hari river using Cytb gene

Recently, Luciobarbus conocephalus from the Hari River was reported for the first time, but there is doubt about the validity of this species between authors, because some of them placed it as a subspecies or synonym of L. capito. Therefore, the present study was conducted to investigate the status of phylogeny and the validity of this species. For this purpose, specimens captured from Hari Riv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015