Sequence analysis Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping

نویسندگان

Hongyi Xin

John Greth

John Emmons

Gennady Pekhimenko

Carl Kingsford

Can Alkan

Onur Mutlu

چکیده

Motivation: Calculating the edit-distance (i.e. minimum number of insertions, deletions and substitutions) between short DNA sequences is the primary task performed by seed-and-extend based mappers, which compare billions of sequences. In practice, only sequence pairs with a small editdistance provide useful scientific data. However, the majority of sequence pairs analyzed by seedand-extend based mappers differ by significantly more errors than what is typically allowed. Such error-abundant sequence pairs needlessly waste resources and severely hinder the performance of read mappers. Therefore, it is crucial to develop a fast and accurate filter that can rapidly and efficiently detect error-abundant string pairs and remove them from consideration before more computationally expensive methods are used. Results: We present a simple and efficient algorithm, Shifted Hamming Distance (SHD), which accelerates the alignment verification procedure in read mapping, by quickly filtering out error-abundant sequence pairs using bit-parallel and SIMD-parallel operations. SHD only filters string pairs that contain more errors than a user-defined threshold, making it fully comprehensive. It also maintains high accuracy with moderate error threshold (up to 5% of the string length) while achieving a 3-fold speedup over the best previous algorithm (Gene Myers’s bit-vector algorithm). SHD is compatible with all mappers that perform sequence alignment for verification. Availability and implementation: We provide an implementation of SHD in C with Intel SSE instructions at: https://github.com/CMU-SAFARI/SHD. Contact: [email protected], [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping

MOTIVATION Calculating the edit-distance (i.e. minimum number of insertions, deletions and substitutions) between short DNA sequences is the primary task performed by seed-and-extend based mappers, which compare billions of sequences. In practice, only sequence pairs with a small edit-distance provide useful scientific data. However, the majority of sequence pairs analyzed by seed-and-extend ba...

متن کامل

Sequence analysis GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping

Motivation: High throughput DNA sequencing (HTS) technologies generate an excessive number of small DNA segments -called short readsthat cause significant computational burden. To analyze the entire genome, each of the billions of short reads must be mapped to a reference genome based on the similarity between a read and ‘candidate’ locations in that reference genome. The similarity measurement...

متن کامل

GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping

Motivation High throughput DNA sequencing (HTS) technologies generate an excessive number of small DNA segments -called short reads- that cause significant computational burden. To analyze the entire genome, each of the billions of short reads must be mapped to a reference genome based on the similarity between a read and 'candidate' locations in that reference genome. The similarity measuremen...

متن کامل

MAGNET: Understanding and Improving the Accuracy of Genome Pre-Alignment Filtering

In the era of high throughput DNA sequencing (HTS) technologies, calculating the edit distance (i.e., the minimum number of substitutions, insertions, and deletions between a pair of sequences) for billions of genomic sequences is the computational bottleneck in todays read mappers. The shifted Hamming distance (SHD) algorithm proposes a fast filtering strategy that can rapidly filter out inval...

متن کامل

Hobbes: optimized gram-based methods for efficient read alignment

Recent advances in sequencing technology have enabled the rapid generation of billions of bases at relatively low cost. A crucial first step in many sequencing applications is to map those reads to a reference genome. However, when the reference genome is large, finding accurate mappings poses a significant computational challenge due to the sheer amount of reads, and because many reads map to ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Sequence analysis Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping

نویسندگان

چکیده

منابع مشابه

Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping

Sequence analysis GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping

GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping

MAGNET: Understanding and Improving the Accuracy of Genome Pre-Alignment Filtering

Hobbes: optimized gram-based methods for efficient read alignment

عنوان ژورنال:

اشتراک گذاری