Generalized Planted (l, d)-Motif Problem with Negative Set
نویسندگان
چکیده
Finding similar patterns (motifs) in a set of sequences is an important problem in Computational Molecular Biology. Pevzner and Sze [18] defined the planted (l,d)-motif problem as trying to find a lengthl pattern that occurs in each input sequence with at most d substitutions. When d is large, this problem is difficult to solve because the input sequences do not contain enough information on the motif. In this paper, we propose a generalized planted (l,d)-motif problem which considers as input an additional set of sequences without any substring similar to the motif (negative set) as extra information. We analyze the effects of this negative set on the finding of motifs, and define a set of unsolvable problems and another set of most difficult problems, known as “challenging generalized problems”. We develop an algorithm called VANS based on voting and other novel techniques, which can solve the (9,3), (11,4),(15,6) and (20,8)-motif problems which were unsolvable before as well as challenging problems of the planted (l,d)-motif problem such as (9,2), (11,3), (15,5) and (20,7)-motif problems.
منابع مشابه
Exact Algorithms for Planted Motif Problems CONTACT AUTHOR:
The problem of identifying meaningful patterns (i.e., motifs) from biological data has been studied extensively due to its paramount importance. Three versions of this problem have been identified in the literature. One of these three problems is the planted (l, d)-motif problem. Several instances of this problem have been posed as a challenge. Numerous algorithms have been proposed in the lite...
متن کاملSpace and Time Efficient Algorithms for Planted Motif Search
We consider the (l, d) Planted Motif Search Problem, a problem that arises from the need to find transcription factor-binding sites in genomic information. We propose the algorithms PMSi and PMSP which are based on ideas considered in PMS1 [6]. These algorithms are exact, make use of less space than the known exact algorithms such as PMS and are able to tackle instances with large values of d. ...
متن کاملqPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences
Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is ...
متن کاملA Frequent Pattern Mining Method for Finding Planted Motifs of Unknown Length in DNA Sequences
Identification and characterization of gene regulatory binding motifs is one of the fundamental tasks toward systematically understanding the molecular mechanisms of transcriptional regulation. Recently, the problem has been abstracted as the challenge planted (l,d)-motif problem. Previous studies have developed numerous methods to solve the problem. But most of them need to specify the length ...
متن کاملOn the Challenging Instances of the Planted Motif Problem
A classic problem of motif discovery in DNA sequences, called the Planted (l, d)-Motif Problem has been widely studied over the past decade owing to its application in identifying vital signals such as transcription factor binding sites. Challenging instances of the problem are those that have been probabilistically proved as ‘difficult to be solved’ due to the existence of several motifs by ra...
متن کامل