Combining frequency and positional information to predict transcription factor binding sites

نویسندگان

  • Szymon M. Kielbasa
  • Jan O. Korbel
  • Dieter Beule
  • Johannes Schuchhardt
  • Hanspeter Herzel
چکیده

MOTIVATION Even though a number of genome projects have been finished on the sequence level, still only a small proportion of DNA regulatory elements have been identified. Growing amounts of gene expression data provide the possibility of finding coregulated genes by clustering methods. By analysis of the promoter regions of those genes, rather weak signals of transcription factor binding sites may be detected. RESULTS We introduce the new algorithm ITB, an Integrated Tool for Box finding, which combines frequency and positional information to predict transcription factor binding sites in upstream regions of coregulated genes. Motifs are extracted by exhaustive analysis of regular expression-like patterns and by estimating probabilities of positional clusters of motifs. ITB detects consensus sequences of experimentally verified transcription factor binding sites of the yeast Saccharomyces cerevisiae. Moreover, a number of new binding site candidates with significant scores are predicted. Besides applying ITB on yeast upstream regions, the program is run on human promoter sequences. AVAILABILITY ITB is available upon request.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis

Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional i...

متن کامل

Analysis of Pairwise Dependency Information Content for Representing and Searching for Transcription Factor Binding Sites

Transcription factors are proteins that are able to bind to certain segments of DNA to control gene expression. We present an improvement upon supervised learning approaches used for finding transcription factor binding sites. We look at binding sites of the same length for a single transcription factor and use the Berg and von Hippel scoring method. Pairwise information content of positional d...

متن کامل

Assessment of Algorithms for Inferring Positional Weight Matrix Motifs of Transcription Factor Binding Sites Using Protein Binding Microarray Data

The new technology of protein binding microarrays (PBMs) allows simultaneous measurement of the binding intensities of a transcription factor to tens of thousands of synthetic double-stranded DNA probes, covering all possible 10-mers. A key computational challenge is inferring the binding motif from these data. We present a systematic comparison of four methods developed specifically for recons...

متن کامل

Motif discovery programs

BayesMD [1] is a probabilistic, Bayesian model for predicting novel transcription factor binding sites. Biological information about binding sites properties, background sequence models, occurrence and positional preferences are built into the model in modular fashion. Mixture prior parameters for the motif and background are trained using information on TFBSs and organismspecific promoter sequ...

متن کامل

Computational annotation of transcription factor binding sites in D. Melanogaster developmental genes.

Drosophila melanogaster is one of the most important organisms for studying the genetics of development. The precise regulation of genes during early development is enacted through the control of transcription. The control circuitry is hardwired in the genome as clusters of multiple transcription factor binding sites (TFBS) known as cis-regulatory modules (CRMs). A number of TFBS and CRMs have ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 17 11  شماره 

صفحات  -

تاریخ انتشار 2001