Identifying transcription factor binding sites through Markov chain optimization
نویسندگان
چکیده
Even though every cell in an organism contains the same genetic material, each cell does not express the same cohort of genes. Therefore, one of the major problems facing genomic research today is to determine not only which genes are differentially expressed and under what conditions, but also how the expression of those genes is regulated. The first step in determining differential gene expression is the binding of sequence-specific DNA binding proteins (i.e. transcription factors) to regulatory regions of the genes (i.e. promoters and enhancers). An important aspect to understanding how a given transcription factor functions is to know the entire gamut of binding sites and subsequently potential target genes that the factor may bind/regulate. In this study, we have developed a computer algorithm to scan genomic databases for transcription factor binding sites, based on a novel Markov chain optimization method, and used it to scan the human genome for sites that bind to hepatocyte nuclear factor 4 alpha (HNF4alpha). A list of 71 known HNF4alpha binding sites from the literature were used to train our Markov chain model. By looking at the window of 600 nucleotides around the transcription start site of each confirmed gene on the human genome, we identified 849 sites with varying binding potential and experimentally tested 109 of those sites for binding to HNF4alpha. Our results show that the program was very successful in identifying 77 new HNF4alpha binding sites with varying binding affinities (i.e. a 71% success rate). Therefore, this computational method for searching genomic databases for potential transcription factor binding sites is a powerful tool for investigating mechanisms of differential gene regulation.
منابع مشابه
A Novel Transcription Factor Binding Sites Prediction Approach
Transcription factors (TFs) and their DNA binding motifs, called transcription factor binding sites (TFBSs) play important roles in most biological processes. However, the list for TFBSs still remains largely unknown. Machine learning approaches have been intensively applied to predict TFBSs. In this paper, a novel prediction approach has been presented based on Markov Chain Monte Carlo (MCMC) ...
متن کاملQPS -- quadratic programming sampler, a motif finder using biophysical modeling
We present a Markov chain Monte Carlo algorithm for local alignments of nucleotide sequences aiming to infer putative transcription factor binding sites, referred to as the quadratic programming sampler. The new motif finder incorporates detailed biophysical modeling of the transcription factor binding site recognition which arises an intrinsic threshold discriminating putative binding sites fr...
متن کاملHeterogeneity in DNA multiple alignments: modeling, inference, and applications in motif finding.
Transcription factors bind sequence-specific sites in DNA to regulate gene transcription. Identifying transcription factor binding sites (TFBSs) is an important step for understanding gene regulation. Although sophisticated in modeling TFBSs and their combinatorial patterns, computational methods for TFBS detection and motif finding often make oversimplified homogeneous model assumptions for ba...
متن کاملModeling within-motif dependence for transcription factor binding site predictions
MOTIVATION The position-specific weight matrix (PWM) model, which assumes that each position in the DNA site contributes independently to the overall protein-DNA interaction, has been the primary means to describe transcription factor binding site motifs. Recent biological experiments, however, suggest that there exists interdependence among positions in the binding sites. In order to exploit t...
متن کاملUsing Hidden Markov Models to Model Multiple Transcription Factor Binding
Transcription is a cellular process leading to protein synthesis. The process is activated through the binding of proteins to specific sequences of the DNA strand. These proteins are referred to as transcription factors, and their DNA binding sites are called motifs. Computational techniques are used to predict and study such motifs. The two transcription factors Oct4 and Sox9 are known to be c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 18 Suppl 2 شماره
صفحات -
تاریخ انتشار 2002