Compact and evenly distributed k-mer binning for genomic sequences
نویسندگان
چکیده
منابع مشابه
Compact Universal k-mer Hitting Sets
We address the problem of finding a minimum-size set of k-mers that hits L-long sequences. The problem arises in the design of compact hash functions and other data structures for efficient handling of large sequencing datasets. We prove that the problem of hitting a given set of L-long sequences is NP-hard and give a heuristic solution that finds a compact universal k-mer set that hits any set...
متن کاملEvenly Distributed Depth is the Worst for Distributed Snooping
In the purely object-parallel approach to multiprocessor rendering, each processor is assigned responsibility to render a subset of the graphics database. When rendering is complete, pixels from the processors must be merged and globally z-bu ered. On an arbitrary multiprocessor interconnection network, the straightforward algorithm for pixel merging requires dA total network bandwidth per fram...
متن کاملConsensus Clustering for Binning Metagenome Sequences
The advances in next-generation sequencing technologies allow researchers to sequence in parallel millions of microbial organisms directly from environmental samples. The result of this “shotgun” sequencing are many short DNA fragments of different organisms, which constitute the basis for the field of metagenomics. Although there are big databases with known microbial DNA that allow us classif...
متن کاملStatistics for K-mer Based Splicing Analysis
It is well acknowledged that alternative splicing module plays a crucial role to identify the variations of the RNA transcriptomes. In high-throughput short-read RNA, splicing analysis is a challenging task due to the uncertainty and time complexity of reads alignments onto genome and transcriptome. In this paper, we introduce k-mer based statistical method for splicing event analysis. The k-me...
متن کاملVirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data
BACKGROUND Identifying viral sequences in mixed metagenomes containing both viral and host contigs is a critical first step in analyzing the viral component of samples. Current tools for distinguishing prokaryotic virus and host contigs primarily use gene-based similarity approaches. Such approaches can significantly limit results especially for short contigs that have few predicted proteins or...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bioinformatics
سال: 2021
ISSN: 1367-4803,1460-2059
DOI: 10.1093/bioinformatics/btab156