READSCAN: a fast and scalable pathogen discovery program with accurate genome relative abundance estimation

نویسندگان

  • Raeece Naeem
  • Mamoon Rashid
  • Arnab Pain
چکیده

UNLABELLED READSCAN is a highly scalable parallel program to identify non-host sequences (of potential pathogen origin) and estimate their genome relative abundance in high-throughput sequence datasets. READSCAN accurately classified human and viral sequences on a 20.1 million reads simulated dataset in <27 min using a small Beowulf compute cluster with 16 nodes (Supplementary Material). AVAILABILITY http://cbrc.kaust.edu.sa/readscan.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Kmerlight: fast and accurate k-mer abundance estimation

k-mers (nucleotide strings of length k) form the basis of several algorithms in computational genomics. In particular, k-mer abundance information in sequence data is useful in read error correction, parameter estimation for genome assembly, digital normalization etc. We give a streaming algorithm Kmerlight for computing the k-mer abundance histogram from sequence data. Our algorithm is fast an...

متن کامل

Accurate Genome Relative Abundance Estimation Based on Shotgun Metagenomic Reads

Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read as...

متن کامل

Fast and accurate quantification and differential analysis of transcriptomes

Fast and accurate quantification and differential analysis of transcriptomes by Harold Joseph Pimentel Doctor of Philosophy in Computer science University of California, Berkeley Professor Lior Pachter, Chair As access to DNA sequencing has become ubiquitous to scientists, the use of sequencers has expanded from determining the genomes of individuals to performing molecular probing assays. Thes...

متن کامل

Papaya Dieback in Malaysia: A StepTowards A New Insight of Disease Resistance

A recently published article describing the draft genome of Erwiniamallotivora BT-Mardi (1), the causal pathogen of papaya dieback infection in Peninsular Malaysia, hassignificant potential to overcome and reduce the effect of this vulnerable crop (2). The authors found that the draft genome sequenceis approximately 4824 kbp and the G+C content of the genomewas 52-54%, which is very similarto t...

متن کامل

I-49: Human Y Chromosome ProteomeProject

The success of the Human Genome Project (HGP) has provided a blueprint for the approximately 20,000 gene-encoded proteins potentially active in all of the hundreds of cell types that make up the human body. Yet we still have limited knowledge about a majority of the gene-encoded proteins which are the “building blocks of life” and “cellular machinery”. It is estimated that for nearly half of th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 29  شماره 

صفحات  -

تاریخ انتشار 2013