Expanding and Vetting Sorghum bicolor Gene Annotations through Transcriptome and Methylome Sequencing
نویسندگان
چکیده
With the emergence and subsequent advancement of nextgeneration sequence technology, detailed structural and functional characterization of genomes is readily attainable. Here, we have sampled the Sorghum bicolor methylome by shallow sequencing of HSO3 – (bisulfite)-treated DNA and have used these data to identify methylation patterns associated with high confidence gene models. We trained a classifier to predict functional gene models based on expression levels, methylation profiles, and sequence conservation. We have expanded the transcriptome atlas by sequencing RNA from meristematic tissues, florets, and embryos, and utilized this information to develop a more complete annotation of the sorghum transcriptome. Our gene annotations modify 60% of Sbi1.4 (version 1.4 of sorghum gene annotations) gene models. The updated models most often have extended untranslated region (UTR) annotations (18,105), but some show longer protein coding regions (5096) or previously unannotated alternative transcripts (6493). A phylogenetic analysis suggests that 800 genes are missing from annotation Sbi1.4 and 400 gene models are split. The new annotations resolve 50% of split gene models and include 30% of conserved genes missing from the Sbi1.4 annotation. Using our classifier, we identified a large set of 34,276 novel potentially functional transcribed regions. These transcribed regions include protein coding genes, non-coding RNAs, and other classes of gene products. Sorghum [Sorghum bicolor (L.) Moench] is a C4 grass native to Africa, and its tolerance to drought and high temperature allows sorghum to thrive in the arid regions of Africa, Australia, Asia, and the Americas. While sorghum is primarily grown for grain and forage, sweet and high-biomass sorghums have recently emerged as dedicated bioenergy crops (Rooney et al., 2007). Sorghum has been used as a model C4 grass species owing to a relatively small genome (730 Mb) (Price et al., 2005), excellent genetic and germplasm resources (Dillon et al., 2007), and an evolutionary relationship to important crop species including maize, rice, and sugarcane (Devos and Gale, 2000; Paterson et al., 2000; Paterson et al., 2009b). The expansion of the sorghum genome relative to rice is largely pericentromeric-localized heterochromatin, and alignment of rice and sorghum genomes reveal similar quantities of euchromatin with largely collinear gene order (Kim et al., 2005). In 2009, a team of international collaborators reported the sequence and annotation of the sorghum genome, releasing the assembly to the public (Paterson et al., 2009a). Gene annotation was performed using a combination of Published in The Plant Genome 7 doi: 10.3835/plantgenome2013.08.0025 © Crop Science Society of America 5585 Guilford Rd., Madison, WI 53711 USA An open-access publication All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher. A. Olson, Z. Lu, M. Regulski, and D. Ware, Cold Spring Harbor Lab., Cold Spring Harbor, NY, 11724. R.R. Klein and D.V. Dugas, USDA-ARS, Southern Plains Agricultural Research Center, College Station, TX 77845. P.E. Klein, Dep. of Horticulture, Texas A&M Univ., College Station, TX, 77843. D. Ware, USDA-ARS, Robert W. Holley Center for Agriculture and Health, Cornell Univ., Ithaca, NY 14853. A. Olson and R.R. Klein contributed equally to this work. Received 2 Aug. 2013. *Corresponding author ([email protected]). Abbreviations: ABA, abscisic acid; cDNA, complementary DNA; CDS, coding sequence; CHG, nucleotide triple (cytosine, not guanine, guanine); CHH, nucleotide triple (cystosine, not guanine, not guanine); CpG; dinucleotide pair (cytosine, guanine); EST, expressed sequence tag; HMR, hypomethylated region; nTAR, novel transcriptionally active regions; ORF, open reading fram; RT-PCR, reverse transcription polymerase chain reaction; SNP, single nucleotide polymorphism; TSS, transcription start site; TTS, transcription termination site; UTR, untranslated region.
منابع مشابه
DNA methylation and gene expression regulation associated with vascularization in Sorghum bicolor
Plant secondary cell walls constitute the majority of plant biomass. They are predominantly found in xylem cells, which are derived from vascular initials during vascularization. Little is known about these processes in grass species despite their emerging importance as biomass feedstocks. The targeted biofuel crop Sorghum bicolor has a sequenced and well-annotated genome, making it an ideal mo...
متن کاملDe novo transcriptome assembly of Sorghum bicolor variety Taejin
Sorghum (Sorghum bicolor), also known as great millet, is one of the most popular cultivated grass species in the world. Sorghum is frequently consumed as food for humans and animals as well as used for ethanol production. In this study, we conducted de novo transcriptome assembly for sorghum variety Taejin by next-generation sequencing, obtaining 8.748 GB of raw data. The raw data in this stud...
متن کاملMOROKOSHI: Transcriptome Database in Sorghum bicolor
In transcriptome analysis, accurate annotation of each transcriptional unit and its expression profile is essential. A full-length cDNA (FL-cDNA) collection facilitates the refinement of transcriptional annotation, and accurate transcription start sites help to unravel transcriptional regulation. We constructed a normalized FL-cDNA library from eight growth stages of aerial tissues in Sorghum b...
متن کاملA survey of the sorghum transcriptome using single-molecule long reads
Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length spl...
متن کاملTranscriptome Characterization and Functional Marker Development in Sorghum Sudanense
Sudangrass, Sorghum sudanense, is an important forage in warm regions. But little is known about its genome. In this study, the transcriptomes of sudangrass S722 and sorghum Tx623B were sequenced by Illumina sequencing. More than 4Gb bases were sequenced for each library. For Tx623B and S722, 88.79% and 83.88% reads, respectively were matched to the Sorghum bicolor genome. A total of 2,397 diff...
متن کامل