Protein Similarity Search with Subset Seeds on a Dedicated Reconfigurable Hardware
نویسندگان
چکیده
With a sharp increase of available DNA and protein sequence data, new precise and fast similarity search methods are needed for largescale genome and proteome comparisons. Modern seed-based techniques of similarity search (spaced seeds, multiple seeds, subset seeds) provide a better sensitivity/specificity ratio. We present an implementation of such a seed-based technique on a parallel specialized hardware embedding reconfigurable architecture (FPGA), where the FPGA is tightly connected to large capacity Flash memories. This parallel system allows large databases to be fully indexed and rapidly accessed. Compared to traditional approaches presented by the Blastp software, we obtain both a significant speed-up and better results. To the best of our knowledge, this is the first attempt to exploit efficient seed-based algorithms for parallelizing the sequence similarity search.
منابع مشابه
Subset Seed Extension to Protein BLAST
A bstract: The seeding technique became central in the theory of sequence alignment and there are several efficient tools applying seeds to D N A homology search. Recently, a concept of subset seeds has been proposed for similarity search in protein sequences. We experimentally evaluate the applicability of subset seeds to protein homology search. We advocate the use of multiple subset seeds de...
متن کاملAn Approach for Homology Search with Reconfigurable Hardware∗
Smith Waterman Algorithm[1] is an efficient and useful algorithm for homology search problems. However, it can not be processed within reasonable time on desktop computer systems, therefore, dedicated hardware systems which is very expensive are used in general. In this paper, we propose a homology search system with reconfigurable hardware. For reducing the cost, the system is composed of off-...
متن کاملEfficient Seeding Techniques for Protein Similarity Search
We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform an analysis of seeds built over those alphabet and compare t...
متن کاملin ri a - 00 00 11 64 , v er si on 1 - 2 4 M ar 2 00 6 A unifying framework for seed sensitivity and its application to subset seeds ( Extended abstract )
We propose a general approach to compute the seed sensitivity, that can be applied to different definitions of seeds. It treats separately three components of the seed sensitivity problem – a set of target alignments, an associated probability distribution, and a seed model – that are specified by distinct finite automata. The approach is then applied to a new concept of subset seeds for which ...
متن کامل2 7 Ja n 20 06 A unifying framework for seed sensitivity and its application to subset seeds ( Extended abstract )
We propose a general approach to compute the seed sensitivity, that can be applied to different definitions of seeds. It treats separately three components of the seed sensitivity problem – a set of target alignments, an associated probability distribution, and a seed model – that are specified by distinct finite automata. The approach is then applied to a new concept of subset seeds for which ...
متن کامل