Clustering of Short Read Sequences for de novo Transcriptome Assembly
Authors
Abstract:
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with different k-mer lengths. Then, the eclectic mixtures ofsequences are gathered in order to form the final sequences. Lastly, the contiguous sequencesare clustered and the isoform groups are provided. This proposed algorithm is capable ofgenerating long contiguous sequences and accurately clustering them into isoform groups.Toevaluate our algorithm, we applied it to a simulated RNA-seq dataset of rat transcriptome and areal RNA-seq experiment of the loricaria gr. cataphracta transcriptome. The correctness of theassembled contigs was more than 95%, and our algorithm was able to reconstruct over 70% ofthe transcripts at more than 80% of the transcripts’ lengths. This study demonstrates thatapplying a sophisticated merging method improves transcriptome assembly. The source code isavailable upon request by contacting the corresponding author by email.
similar resources
clustering of short read sequences for de novo transcriptome assembly
given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. in this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. first, the contiguous sequencesare generated using de bruijn graph with d...
full textOptimization of De Novo Short Read Assembly of Seabuckthorn (Hippophae rhamnoides L.) Transcriptome
Seabuckthorn (Hippophaerhamnoides L.) is known for its medicinal, nutritional and environmental importance since ancient times. However, very limited efforts have been made to characterize the genome and transcriptome of this wonder plant. Here, we report the use of next generation massive parallel sequencing technology (Illumina platform) and de novo assembly to gain a comprehensive view of th...
full textThe Impacts of Read Length and Transcriptome Complexity for De Novo Assembly: A Simulation Study
Transcriptome assembly using RNA-seq data - particularly in non-model organisms has been dramatically improved, but only recently have the pre-assembly procedures, such as sequencing depth and error correction, been studied. Increasing read length is viewed as a crucial condition to further improve transcriptome assembly, but it is unknown whether the read length really matters. In addition, th...
full textVelvet: algorithms for de novo short read assembly using de Bruijn graphs.
We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length...
full textDe novo transcriptome assembly with ABySS
MOTIVATION Whole transcriptome shotgun sequencing data from non-normalized samples offer unique opportunities to study the metabolic states of organisms. One can deduce gene expression levels using sequence coverage as a surrogate, identify coding changes or discover novel isoforms or transcripts. Especially for discovery of novel events, de novo assembly of transcriptomes is desirable. RESUL...
full textDe novo assembly of short sequence reads
A new generation of sequencing technologies is revolutionizing molecular biology. Illumina's Solexa and Applied Biosystems' SOLiD generate gigabases of nucleotide sequence per week. However, a perceived limitation of these ultra-high-throughput technologies is their short read-lengths. De novo assembly of sequence reads generated by classical Sanger capillary sequencing is a mature field of res...
full textMy Resources
Journal title
volume 4 issue 1
pages 43- 52
publication date 2014-05-01
By following a journal you will be notified via email when a new issue of this journal is published.
Hosted on Doprax cloud platform doprax.com
copyright © 2015-2023