Dinoflagellate Genomic Organization and Phylogenetic Marker Discovery Utilizing Deep Sequencing Data
نویسندگان
چکیده
Title of Dissertation: DINOFLAGELLATE GENOMIC ORGANIZATION AND PHYLOGENETIC MARKER DISCOVERY UTILIZING DEEP SEQUENCING DATA Gregory Scott Mendez, Doctor of Philosophy, 2016 Dissertation directed by: Professor Charles F. Delwiche, Cell Biology and Molecular Genetics Dinoflagellates possess large genomes in which most genes are present in many copies. This has made studies of their genomic organization and phylogenetics challenging. Recent advances in sequencing technology have made deep sequencing of dinoflagellate transcriptomes feasible. This dissertation investigates the genomic organization of dinoflagellates to better understand the challenges of assembling dinoflagellate transcriptomic and genomic data from short read sequencing methods, and develops new techniques that utilize deep sequencing data to identify orthologous genes across a diverse set of taxa. To better understand the genomic organization of dinoflagellates, a genomic cosmid clone of the tandemly repeated gene Alchohol Dehydrogenase (AHD) was sequenced and analyzed. The organization of this clone was found to be counter to prevailing hypotheses of genomic organization in dinoflagellates. Further, a new non-canonical splicing motif was described that could greatly improve the automated modeling and annotation of genomic data. A custom phylogenetic marker discovery pipeline, incorporating methods that leverage the statistical power of large data sets was written. A case study on Stramenopiles was undertaken to test the utility in resolving relationships between known groups as well as the phylogenetic affinity of seven unknown taxa. The pipeline generated a set of 373 genes useful as phylogenetic markers that successfully resolved relationships among the major groups of Stramenopiles, and placed all unknown taxa on the tree with strong bootstrap support. This pipeline was then used to discover 668 genes useful as phylogenetic markers in dinoflagellates. Phylogenetic analysis of 58 dinoflagellates, using this set of markers, produced a phylogeny with good support of all branches. The Suessiales were found to be sister to the Peridinales. The Prorocentrales formed a monophyletic group with the Dinophysiales that was sister to the Gonyaulacales. The Gymnodinales was found to be paraphyletic, forming three monophyletic groups. While this pipeline was used to find phylogenetic markers, it will likely also be useful for finding orthologs of interest for other purposes, for the discovery of horizontally transferred genes, and for the separation of sequences in metagenomic data sets. DINOFLAGELLATE GENOMIC ORGANIZATION AND PHYLOGENETIC MARKER DISCOVERY UTILIZING DEEP SEQUENCING DATA
منابع مشابه
Inferring phylogenetic history from restriction site associated DNA (RADseq)
Next-generation sequencing of restriction site associated DNA, or RADseq, was introduced in 2008 as a rapid genotyping method that does not require prior marker development. Developed for linkage mapping, genome-wide association, and population genetic studies, RADseq was initially viewed as ill-suited to interspecific phylogenetic questions. However, since 2012, approximately a dozen RADseq ph...
متن کاملGene-Based Marker Systems in Plants: High Throughput Approaches for Marker Discovery and Genotyping
Abstract Development and application of molecular markers derived from genes, commonly called genic markers or sometimes functional markers, is gaining momentum in plant genetics and breeding. Availability of large amount of sequence data coming from genome/transcriptome sequencing projects as well as advent of next generation sequencing technologies together with advances in bioinformatics too...
متن کاملMarkerMiner 1.0: A new application for phylogenetic marker development using angiosperm transcriptomes1
PREMISE OF THE STUDY Targeted sequencing using next-generation sequencing (NGS) platforms offers enormous potential for plant systematics by enabling economical acquisition of multilocus data sets that can resolve difficult phylogenetic problems. However, because discovery of single-copy nuclear (SCN) loci from NGS data requires both bioinformatics skills and access to high-performance computin...
متن کاملLandscape and variation of novel retroduplications in 26 human populations
Retroduplications come from reverse transcription of mRNAs and their insertion back into the genome. Here, we performed comprehensive discovery and analysis of retroduplications in a large cohort of 2,535 individuals from 26 human populations, as part of 1000 Genomes Phase 3. We developed an integrated approach to discover novel retroduplications combining high-coverage exome and low-coverage w...
متن کاملMapping two genes in the purine metabolism pathway of soybean.
Mapping genes in biochemical pathways allow study of the genomic organization of pathways and geneic relationships within these pathways. Additionally, molecular markers located within the boundaries of a specific gene sequence represent important marker assisted selection resources. We report map locations of two geneic markers from the purine synthesis pathway in soybean (Glycine max (L. merr...
متن کامل