Unveiling the transcriptomic complexity of Miscanthus sinensis using a combination of PacBio long read- and Illumina short read sequencing platforms
نویسندگان
چکیده
Abstract Background Miscanthus sinensis Andersson is a perennial grass that exhibits remarkable lignocellulose characteristics suitable for sustainable bioenergy production. However, knowledge of the genetic resources this species relatively limited, which considerably hampers further work on its biology and improvement. Results In study, through analyzing transcriptome mixed samples leaves stems using latest PacBio Iso-Seq sequencing technology combined with Illumina HiSeq, we report first full-length dataset M. total 58.21 Gb clean data. An average 15.75 reads each sample were obtained from system, doubled data size (6.68 Gb) HiSeq platform. The integrated analyses PacBio- Illumina-based transcriptomic uncovered 408,801 non-redundant transcripts an length 1,685 bp. Of those, 189,406 commonly identified by both methods, 169,149 619 bp uniquely 51,246 2,535 Iso-Seq. Approximately 96 % final mapped back to genome, reflecting high quality coverage our results. When comparing genomes four Andropogoneae, showed closest relationship sugarcane up 93 mapping ratios, followed sorghum 80 indicating conservation orthologs in these three genomes. Furthermore, 306,228 successfully annotated against public databases including cell wall related genes transcript factor families, thus providing many new insights into gene functions. also helped identify 3,898 alternative splicing events 2,963 AS isoforms within 10 function categories. Conclusions Taken together, present study provides rich set greatly enriches understanding resources, facilitating improvement molecular studies species.
منابع مشابه
Improving PacBio Long Read Accuracy by Short Read Alignment
The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long rea...
متن کاملBuilding two indica rice reference genomes with PacBio long-read and Illumina paired-end sequencing data
Over the past 30 years, we have performed many fundamental studies on two Oryza sativa subsp. indica varieties, Zhenshan 97 (ZS97) and Minghui 63 (MH63). To improve the resolution of many of these investigations, we generated two reference-quality reference genome assemblies using the most advanced sequencing technologies. Using PacBio SMRT technology, we produced over 108 (ZS97) and 174 (MH63)...
متن کاملUnveiling the complexity of the maize transcriptome by single-molecule long-read sequencing
Zea mays is an important genetic model for elucidating transcriptional networks. Uncertainties about the complete structure of mRNA transcripts limit the progress of research in this system. Here, using single-molecule sequencing technology, we produce 111,151 transcripts from 6 tissues capturing ∼70% of the genes annotated in maize RefGen_v3 genome. A large proportion of transcripts (57%) repr...
متن کاملRNA-Seq Analysis and Gene Discovery of Andrias davidianus Using Illumina Short Read Sequencing
The Chinese giant salamander, Andrias davidianus, is an important species in the course of evolution; however, there is insufficient genomic data in public databases for understanding its immunologic mechanisms. High-throughput transcriptome sequencing is necessary to generate an enormous number of transcript sequences from A. davidianus for gene discovery. In this study, we generated more than...
متن کاملthe effect of task complexity on lexical complexity and grammatical accuracy of efl learners’ argumentative writing
بر اساس فرضیه شناخت رابینسون (2001 و 2003 و 2005) و مدل ظرفیت توجه محدود اسکهان (1998)، این تحقیق تاثیر پیچیدگی تکلیف را بر پیچیدگی واژگان و صحت گرامری نوشتار مباحثه ای 60 نفر از دانشجویان زبان انگلیسی بررسی کرد. میزان پیچیدگی تکلیف از طریق فاکتورهای پراکندگی-منابع تعیین شد. همه ی شرکت کنندگان به صورت نیمه تصادفی به یکی از سه گروه: (1) گروه موضوع، (2) گروه موضوع + اندیشه و (3) گروه موضوع + اندی...
15 صفحه اولذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: BMC Genomics
سال: 2021
ISSN: ['1471-2164']
DOI: https://doi.org/10.1186/s12864-021-07971-x