Reference-guided Assembly of Metagenomic Sequences
نویسندگان
چکیده
Metagenomic studies have primarily relied on de novo approaches for reconstructing genes and genomes from microbial mixtures. While database driven approaches have been employed in certain analyses, they have not been used in the assembly of metagenomic data. This is in part due to the small size and biased coverage of public genome databases, but also due to the inherent computational cost of mapping tens of millions of reads to thousands of full genome sequences. Here we describe the first effective approach for reference-guided metagenomic assembly that can complement and improve upon de novo metagenomic assembly methods. Combined with de novo assembly approaches, we show that MetaCompass is able to generate significantly better results than can be obtained by either comparative or de novo assembly independently. Using this approach we report improved assemblies for 688 metagenomic samples from the Human Microbiome Project.
منابع مشابه
GRASP: Guided Reference-based Assembly of Short Peptides
Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on assembled nucleotide contigs, which themselves are often very fragmented. The fragmented nature of...
متن کاملA Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics
Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes ...
متن کاملMetaCRAST: reference-guided extraction of CRISPR spacers from unassembled metagenomes
Clustered regularly interspaced short palindromic repeat (CRISPR) systems are the adaptive immune systems of bacteria and archaea against viral infection. While CRISPRs have been exploited as a tool for genetic engineering, their spacer sequences can also provide valuable insights into microbial ecology by linking environmental viruses to their microbial hosts. Despite this importance, metageno...
متن کاملClustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...
متن کاملReference-independent comparative metagenomics using cross-assembly: crAss
MOTIVATION Metagenomes are often characterized by high levels of unknown sequences. Reads derived from known microorganisms can easily be identified and analyzed using fast homology search algorithms and a suitable reference database, but the unknown sequences are often ignored in further analyses, biasing conclusions. Nevertheless, it is possible to use more data in a comparative metagenomic a...
متن کامل