Crystallizing short-read assemblies around lone Sanger reads
نویسندگان
چکیده
New short-read sequencing technologies produce large volumes of 25-30 base paired-end reads. In this paper, we present a sequencing protocol and de novo assembler program (SHORTY) targeted towards such microread data. Our protocol augments short-paired reads using a trivially small number of Sanger reads (only one to three reads per bacterial genome). Still, these “seed reads” enable us to produce significant assemblies using about half the short-read coverage (50-60X) of comparable assemblers, despite our assumption of base error rates at least 10 times that of other groups. SHORTY exploits two new ideas which we believe to be of interest to the shortread assembly community: (1) using single seed reads to crystalize assemblies, and (2) estimating intercontig distances accurately from multiple spanning paired-end reads. Contact: [email protected]
منابع مشابه
Short read fragment assembly of bacterial genomes.
In the last year, high-throughput sequencing technologies have progressed from proof-of-concept to production quality. While these methods produce high-quality reads, they have yet to produce reads comparable in length to Sanger-based sequencing. Current fragment assembly algorithms have been implemented and optimized for mate-paired Sanger-based reads, and thus do not perform well on short rea...
متن کاملDe novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads
Reference-quality genomes are expected to provide a resource for studying gene structure, function, and evolution. However, often genes of interest are not completely or accurately assembled, leading to unknown errors in analyses or additional cloning efforts for the correct sequences. A promising solution is long-read sequencing. Here we tested PacBio-based long-read sequencing and diploid ass...
متن کاملImproved assembly of noisy long reads by k-mer validation.
Genome assembly depends critically on read length. Two recent technologies, from Pacific Biosciences (PacBio) and Oxford Nanopore, produce read lengths >20 kb, which yield de novo genome assemblies with vastly greater contiguity than those based on Sanger, Illumina, or other technologies. However, the very high error rates of these two new technologies (∼15% per base) makes assembly imprecise a...
متن کاملGenome Sequencing and Assembly by Long Reads in Plants
Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of lo...
متن کاملAssessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data
Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes...
متن کامل