Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata
نویسندگان
چکیده
Heuristic sequence alignment and database search algorithms, such as PatternHunter and BLAST, are based on the initial discovery of so-called alignment seeds of well-conserved alignment patterns, which are subsequently extended to full local alignments. In recent years, the theory of classical seeds (matching contiguous q-grams) has been extended to spaced seeds, which allow mismatches within a seed, and subsequently to indel seeds, which allow gaps in the underlying alignment. Different seeds within a given class of seeds are usually compared by their sensitivity, that is, the probability to match an alignment generated from a particular probabilistic alignment model. We present a flexible, exact, unifying framework called probabilistic arithmetic automaton for seed sensitivity computation that includes all previous results on spaced and indel seeds. In addition, we can easily incorporate sets of arbitrary seeds. Instead of only computing the probability of at least one hit (the standard definition of sensitivity), we can optionally provide the entire distribution of overlapping or non-overlapping seed hits, which yields a different characterization of a seed. A symbolic representation allows fast computation for any set of parameters.
منابع مشابه
Stochastic Satisfiability Modulo Theory: A Novel Technique for the Analysis of Probabilistic Hybrid Systems
The analysis of hybrid systems exhibiting probabilistic behaviour is notoriously difficult. To enable mechanised analysis of such systems, we extend the reasoning power of arithmetic satisfiability-modulo-theory solving (SMT) by a comprehensive treatment of randomized (a.k.a. stochastic) quantification over discrete variables within the mixed Boolean-arithmetic constraint system. This provides ...
متن کاملOn the Complexity of the Equivalence Problem for Probabilistic Automata
Deciding equivalence of probabilistic automata is a key problem for establishing various behavioural and anonymity properties of probabilistic systems. In recent experiments a randomised equivalence test based on polynomial identity testing outperformed deterministic algorithms. In this paper we show that polynomial identity testing yields efficient algorithms for various generalisations of the...
متن کاملExact Analysis of Pattern Matching Algorithms with Probabilistic Arithmetic Automata
We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer-Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we show how to efficiently obtain the distribution of such an algorithm’s running time cost for any given pattern in a random text model, which can be quite general, from simple uniform ...
متن کاملRefinement and Difference for Probabilistic Automata
This paper studies a difference operator for stochastic systems whose specifications are represented by Abstract Probabilistic Automata (APAs). In the case refinement fails between two specifications, the target of this operator is to produce a specification APA that represents all witness PAs of this failure. Our contribution is an algorithm that permits to approximate the difference of two de...
متن کاملVector seeds: An extension to spaced seeds
We present improved techniques for finding homologous regions in DNA and protein sequences. Our approach focuses on the core regions of a local pairwise alignment; we suggest new ways to characterize these regions that allow marked improvements in both specificity and sensitivity over existing techniques for sequence alignment. For any such characterization, which we call a vector seed, we give...
متن کامل