Efficient Algorithms for Model-Based Motif Discovery from Multiple Sequences
نویسندگان
چکیده
We study a natural probabilistic model for motif discovery that has been used to experimentally test the quality of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet Σ. A motif G = g1g2 . . . gm is a string of m characters. Each background sequence is implanted a randomly generated approximate copy of G. For a randomly generated approximate copy b1b2 . . . bm of G, every character is randomly generated such that the probability for bi = gi is at most α. In this paper, we give the first analytical proof that multiple background sequences do help for finding subtle and faint motifs.
منابع مشابه
Development of an Efficient Hybrid Method for Motif Discovery in DNA Sequences
This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...
متن کاملSublinear Time Motif Discovery from Multiple Sequences
In this paper, a natural probabilistic model for motif discovery has been used to experimentally test the quality of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet, Σ. A motif G = g1g2 . . . gm is a string of m characters. In each background sequence is implanted a probabilistically-ge...
متن کاملA tree-based approach for motif discovery and sequence classification
MOTIVATION Pattern discovery algorithms are widely used for the analysis of DNA and protein sequences. Most algorithms have been designed to find overrepresented motifs in sparse datasets of long sequences, and ignore most positional information. We introduce an algorithm optimized to exploit spatial information in sparse-but-populous datasets. RESULTS Our algorithm Tree-based Weighted-Positi...
متن کاملAn Application of the ABS LX Algorithm to Multiple Sequence Alignment
We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...
متن کاملGenetic Algorithm Based Probabilistic Motif Discovery in Multiple Unaligned Biological Sequences
Many computational approaches have been introduced for the problem of motif identification in a set of biological sequences, which are classified according to the type of motifs discovered. In this study, we propose a model to discover motif in large set of unaligned sequences in considerably minimum time using genetic algorithm based probabilokistic Motif discovery model. The proposed algorith...
متن کامل