Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data

نویسندگان

Daniel R. Mende

Alison S. Waller

Shinichi Sunagawa

Aino I. Järvelin

Michelle M. Chan

Manimozhiyan Arumugam

Jeroen Raes

Peer Bork

چکیده

Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes of differing community complexities. We first evaluated the effect of rigorous quality control on Illumina data. Although quality filtering removed a large proportion of the data, it greatly improved the accuracy and contig lengths of resulting assemblies. We then compared the quality-trimmed Illumina assemblies to those from Sanger and pyrosequencing. For the simple community (10 genomes) all sequencing technologies assembled a similar amount and accurately represented the expected functional composition. For the more complex community (100 genomes) Illumina produced the best assemblies and more correctly resembled the expected functional composition. For the most complex community (400 genomes) there was very little assembly of reads from any sequencing technology. However, due to the longer read length the Sanger reads still represented the overall functional composition reasonably well. We further examined the effect of scaffolding of contigs using paired-end Illumina reads. It dramatically increased contig lengths of the simple community and yielded minor improvements to the more complex communities. Although the increase in contig length was accompanied by increased chimericity, it resulted in more complete genes and a better characterization of the functional repertoire. The metagenomic simulators developed for this research are freely available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data

A frequent step in metagenomic data analysis comprises the assembly of the sequenced reads. Many assembly tools have been published in the last years targeting data coming from next-generation sequencing (NGS) technologies but these assemblers have not been designed for or tested in multi-genome scenarios that characterize metagenomic studies. Here we provide a critical assessment of current de...

متن کامل

Correction: Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data

The first sentence of the second paragraph of the ‘‘Assemblies’’ subsection of the Methods should have cited reference 34 instead of 33. The correct sentence should read: The Illumina datasets were assembled using SOAPdenovo 1.05 [34] using following parameters: ‘‘-K 23 -L 70 -M 3 -u -R -F’’. The fourth sentence of the first paragraph of the ‘‘Importance of Quality Control for Illumina Data’’ s...

متن کامل

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

SPA: a short peptide assembler for metagenomic data

The metagenomic paradigm allows for an understanding of the metabolic and functional potential of microbes in a community via a study of their proteins. The substrate for protein identification is either the set of individual nucleotide reads generated from metagenomic samples or the set of contig sequences produced by assembling these reads. However, a read-based strategy using reads generated...

متن کامل

Assessment of quality control approaches for metagenomic data analysis

Currently there is an explosive increase of the next-generation sequencing (NGS) projects and related datasets, which have to be processed by Quality Control (QC) procedures before they could be utilized for omics analysis. QC procedure usually includes identification and filtration of sequencing artifacts such as low-quality reads and contaminating reads, which would significantly affect and s...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 7 شماره

صفحات -

تاریخ انتشار 2012

Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data

نویسندگان

چکیده

منابع مشابه

Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data

Correction: Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data

Clustering of Short Read Sequences for de novo Transcriptome Assembly

SPA: a short peptide assembler for metagenomic data

Assessment of quality control approaches for metagenomic data analysis

عنوان ژورنال:

اشتراک گذاری