An efficient motif discovery algorithm with unknown motif length and number of binding sites
نویسندگان
چکیده
Most motif discovery algorithms from DNA sequences require the motif's length as input. Styczynski et al. introduced the Extended (l,d)-Motif Problem (EMP) where the motif's length is not an input parameter. Unfortunately, their algorithm takes an unacceptably long time to run, e.g. over 3 months to discover a length-14 motif. Since the best motif may not be the longest nor have the largest number of binding sites, in this paper we further eliminate another input parameter about the minimum number of binding sites in order to provide more realistic/robust results. We also develop an efficient algorithm to solve EMP and this redefined problem.
منابع مشابه
Development of an Efficient Hybrid Method for Motif Discovery in DNA Sequences
This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...
متن کاملGapped consensus motif discovery: evaluation of a new algorithm based on local multiple alignments and a sampling strategy
We check the efficiency and faisability of a novel method designed for the discovery of a priori unknown motifs described as gaps alternating with specific regions. Such motifs are searched for as consensi of non homologous biological sequences. The only specifications required concern the maximal gap length, the minimal frequency for specific characters and the minimal percentage (quorum) of s...
متن کاملBioOptimizer: Improving Models for Discovery of Transcription Factor Binding Motifs
The experimental determination of TF binding sites is expensive and timeconsuming. Many motif-finding programs have been developed but no program is clearly superior in all situations. Practitioners often find it difficult to judge which of the motifs predicted by these algorithms are more likely to be biologically relevant. We derive a comprehensive scoring function based on a full Bayesian mo...
متن کاملBayesian multiple-instance motif discovery with BAMBI: inference of recombinase and transcription factor binding sites
Finding conserved motifs in genomic sequences represents one of essential bioinformatic problems. However, achieving high discovery performance without imposing substantial auxiliary constraints on possible motif features remains a key algorithmic challenge. This work describes BAMBI-a sequential Monte Carlo motif-identification algorithm, which is based on a position weight matrix model that d...
متن کاملBioOptimizer: a Bayesian scoring function approach to motif discovery
MOTIVATION Transcription factors (TFs) bind directly to short segments on the genome, often within hundreds to thousands of base pairs upstream of gene transcription start sites, to regulate gene expression. The experimental determination of TFs binding sites is expensive and time-consuming. Many motif-finding programs have been developed, but no program is clearly superior in all situations. P...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- International journal of data mining and bioinformatics
دوره 1 2 شماره
صفحات -
تاریخ انتشار 2006