Long spaced seeds for finding similarities between biological sequences
نویسندگان
چکیده
Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. A significant fraction of the computing power in the world is devoted to finding similarities between biological sequences. The introduction of optimal spaced seeds in [Ma et al., Bioinformatics 18 (2002) 440–445] has increased both the sensitivity and the speed of homology search and it has been adopted by many alignment programs such as BLAST. In spite of significant amount of work, there are no algorithms able to compute long good seeds. We present a different approach here by introducing a new measure that has two desired properties: (i) it is highly correlated with sensitivity of spaced seeds and (ii) it is easily computable. Using this measure we give algorithms that compute better seeds than all previous ones. The fact that sensitivity is not required is essential as it enables us to compute very long good seeds, far beyond the size for which sensitivity can be computed.
منابع مشابه
Multiple spaced seeds for homology search
MOTIVATION Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. The introduction of optimal spaced seeds in PatternHunter has increased both the sensitivity and the speed of homology search, and it has been adopted by many alignment programs such as BLAST. With the further improvement provided by multiple spaced seeds in PatternHunterII, Smi...
متن کاملOn spaced seeds for similarity search
Genomics studies routinely depend on similarity searches based on the strategy of finding short seed matches (contiguous k bases) which are then extended. The particular choice of the seed length, k, is determined by the tradeoff between search speed (larger k reduces chance hits) and sensitivity (smaller k finds weaker similarities). A novel idea of using a single deterministic optimized space...
متن کاملgpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences
Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...
متن کاملFast Computation of Good Multiple Spaced Seeds
Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. A significant fraction of computing power in the world is dedicated to performing such tasks. The introduction of optimal spaced seeds by Ma et al. has increased both the sensitivity and the speed of homology search and it has been adopted by many alignment programs such as BLAST. With the...
متن کاملYASS: Similarity search in DNA sequences
We describe YASS – a new tool for finding local similarities in DNA sequences. The YASS algorithm first scans the sequence(s) and creates on the fly groups of seeds (small exact repeats obtained by hashing) according to statistically-founded criteria. Then it tries to extend those groups into similarity regions on the basis of a new extension criterion. The method can be seen as a compromise be...
متن کامل