YOABS: yet other aligner of biological sequences - an efficient linearly scaling nucleotide aligner
نویسنده
چکیده
MOTIVATION Explosive growth of short-read sequencing technologies in the recent years resulted in rapid development of many new alignment algorithms and programs. But most of them are not efficient or not applicable for reads > or approximately equal to 200 bp because these algorithms specifically designed to process short queries with relatively low sequencing error rates. However, the current trend to increase reliability of detection of structural variations in assembled genomes as well as to facilitate de novo sequencing demand complimenting high-throughput short-read platforms with long-read mapping. Thus, algorithms and programs for efficient mapping of longer reads are becoming crucial. However, the choice of long-read aligners effective in terms of both performance and memory are limited and includes only handful of hash table (BLAT, SSAHA2) or trie (Burrows-Wheeler Transform - Smith-Waterman (BWT-SW), Burrows-Wheeler Alignerr - Smith-Waterman (BWA-SW)) based algorithms. RESULTS New O(n) algorithm that combines the advantages of both hash and trie-based methods has been designed to effectively align long biological sequences (> or approximately equal to 200 bp) against a large sequence database with small memory footprint (e.g. ~2 GB for the human genome). The algorithm is accurate and significantly more fast than BLAT or BWT-SW, but similar to BWT-SW it can find all local alignments. It is as accurate as SSAHA2 or BWA-SW, but uses 3+ times less memory and 10+ times faster than SSAHA2, several times faster than BWA-SW with low error rates and almost two times less memory. AVAILABILITY AND IMPLEMENTATION The prototype implementation of the algorithm will be available upon request for non-commercial use in academia (local hit table binary and indices are at ftp://styx.ucsd.edu).
منابع مشابه
GR-Aligner: an algorithm for aligning pairwise genomic sequences containing rearrangement events
MOTIVATION Homologous genomic sequences between species usually contain different rearrangement events. Whether some specific patterns existed in the breakpoint regions that caused such events to occur is still unclear. To resolve this question, it is necessary to determine the location of breakpoints at the nucleotide level. The availability of sequences near breakpoints would further facilita...
متن کاملGS-Aligner: a novel tool for aligning genomic sequences using bit-level operations.
A novel algorithm, GS-Aligner, that uses bit-level operations was developed for aligning genomic sequences. GS-Aligner is efficient in terms of both time and space for aligning two very long genomic sequences and for identifying genomic rearrangements such as translocations and inversions. It is suitable for aligning fairly divergent sequences such as human and mouse genomic sequences. It consi...
متن کاملMontreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi
We present the Montreal Forced Aligner (MFA), a new opensource system for speech-text alignment. MFA is an update to the Prosodylab-Aligner, and maintains its key functionality of trainability on new data, as well as incorporating improved architecture (triphone acoustic models and speaker adaptation), and other features. MFA uses Kaldi instead of HTK, allowing MFA to be distributed as a stand-...
متن کاملRASER: reads aligner for SNPs and editing sites of RNA
MOTIVATION Accurate identification of genetic variants such as single-nucleotide polymorphisms (SNPs) or RNA editing sites from RNA-Seq reads is important, yet challenging, because it necessitates a very low false-positive rate in read mapping. Although many read aligners are available, no single aligner was specifically developed or tested as an effective tool for SNP and RNA editing predictio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 28 8 شماره
صفحات -
تاریخ انتشار 2012