Species-Level Deconvolution of Metagenome Assemblies with Hi-C–Based Contact Probability Maps
نویسندگان
چکیده
Microbial communities consist of mixed populations of organisms, including unknown species in unknown abundances. These communities are often studied through metagenomic shotgun sequencing, but standard library construction methods remove long-range contiguity information; thus, shotgun sequencing and de novo assembly of a metagenome typically yield a collection of contigs that cannot readily be grouped by species. Methods for generating chromatin-level contact probability maps, e.g., as generated by the Hi-C method, provide a signal of contiguity that is completely intracellular and contains both intrachromosomal and interchromosomal information. Here, we demonstrate how this signal can be exploited to reconstruct the individual genomes of microbial species present within a mixed sample. We apply this approach to two synthetic metagenome samples, successfully clustering the genome content of fungal, bacterial, and archaeal species with more than 99% agreement with published reference genomes. We also show that the Hi-C signal can secondarily be used to create scaffolded genome assemblies of individual eukaryotic species present within the microbial community, with higher levels of contiguity than some of the species' published reference genomes.
منابع مشابه
Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products
Metagenomics is a valuable tool for the study of microbial communities but has been limited by the difficulty of "binning" the resulting sequences into groups corresponding to the individual species and strains that constitute the community. Moreover, there are presently no methods to track the flow of mobile DNA elements such as plasmids through communities or to determine which of these are c...
متن کاملUtilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis
MOTIVATION Metagenomics research has accelerated the studies of microbial organisms, providing insights into the composition and potential functionality of various microbial communities. Metatranscriptomics (studies of the transcripts from a mixture of microbial species) and other meta-omics approaches hold even greater promise for providing additional insights into functional and regulatory ch...
متن کاملHiC-spector: a matrix library for spectral and reproducibility analysis of Hi-C contact maps
Summary Genome-wide proximity ligation based assays like Hi-C have opened a window to the 3D organization of the genome. In so doing, they present data structures that are different from conventional 1D signal tracks. To exploit the 2D nature of Hi-C contact maps, matrix techniques like spectral analysis are particularly useful. Here, we present HiC-spector, a collection of matrix-related funct...
متن کاملStatistical model of intra-chromosome contact maps
The statistical properties of intra-chromosome maps obtained by a genome-wide chromosome conformation capture method (Hi-C) are described in the framework of the hierarchical crumpling model of heteropolymer chain with quenched disorder in the primary sequence. We conjecture that the observed Hi-C maps are statistical averages over many different ways of hierarchical genome folding, and show th...
متن کاملIn vitro, long-range sequence information for de novo genome assembly via transposase contiguity.
We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of generating libraries comprised of 9216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to > 1 megabase. These pools are "subhaploid," in that th...
متن کامل