dPeak: High Resolution Identification of Transcription Factor Binding Sites from PET and SET ChIP-Seq Data

نویسندگان

  • Dongjun Chung
  • Dan Park
  • Kevin Myers
  • Jeffrey Grass
  • Patricia Kiley
  • Robert Landick
  • Sündüz Keles
چکیده

Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) has been successfully used for genome-wide profiling of transcription factor binding sites, histone modifications, and nucleosome occupancy in many model organisms and humans. Because the compact genomes of prokaryotes harbor many binding sites separated by only few base pairs, applications of ChIP-Seq in this domain have not reached their full potential. Applications in prokaryotic genomes are further hampered by the fact that well studied data analysis methods for ChIP-Seq do not result in a resolution required for deciphering the locations of nearby binding events. We generated single-end tag (SET) and paired-end tag (PET) ChIP-Seq data for σ⁷⁰ factor in Escherichia coli (E. coli). Direct comparison of these datasets revealed that although PET assay enables higher resolution identification of binding events, standard ChIP-Seq analysis methods are not equipped to utilize PET-specific features of the data. To address this problem, we developed dPeak as a high resolution binding site identification (deconvolution) algorithm. dPeak implements a probabilistic model that accurately describes ChIP-Seq data generation process for both the SET and PET assays. For SET data, dPeak outperforms or performs comparably to the state-of-the-art high-resolution ChIP-Seq peak deconvolution algorithms such as PICS, GPS, and GEM. When coupled with PET data, dPeak significantly outperforms SET-based analysis with any of the current state-of-the-art methods. Experimental validations of a subset of dPeak predictions from σ⁷⁰ PET ChIP-Seq data indicate that dPeak can estimate locations of binding events with as high as 2 to 21 bp resolution. Applications of dPeak to σ⁷⁰ ChIP-Seq data in E. coli under aerobic and anaerobic conditions reveal closely located promoters that are differentially occupied and further illustrate the importance of high resolution analysis of ChIP-Seq data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text S1: Supplementary Methods for “dPeak: High Resolution Identification of Transcription Factor Binding Sites from PET and SET ChIP-Seq Data”

1 Department of Statistics, University of Wisconsin, Madison, WI, U.S.A. 2 Department of Biomolecular Chemistry, University of Wisconsin, Madison, WI, U.S.A. 3 Department of Biochemistry, University of Wisconsin, Madison, WI, U.S.A. 4 Great Lakes Bioenergy Research Center, University of Wisconsin, Madison, WI, U.S.A. 5 Department of Bacteriology, University of Wisconsin, Madison, WI, U.S.A. 6 D...

متن کامل

Identification of transcription factor binding sites from ChIP-seq data at high resolution

MOTIVATION Chromatin immunoprecipitation coupled to next-generation sequencing (ChIP-seq) is widely used to study the in vivo binding sites of transcription factors (TFs) and their regulatory targets. Recent improvements to ChIP-seq, such as increased resolution, promise deeper insights into transcriptional regulation, yet require novel computational tools to fully leverage their advantages. ...

متن کامل

High-Resolution Mapping of In vivo Genomic Transcription Factor Binding Sites Using In situ DNase I Footprinting and ChIP-seq

Accurate identification of the DNA-binding sites of transcription factors and other DNA-binding proteins on the genome is crucial to understanding their molecular interactions with DNA. Here, we describe a new method: Genome Footprinting by high-throughput sequencing (GeF-seq), which combines in vivo DNase I digestion of genomic DNA with ChIP coupled with high-throughput sequencing. We have det...

متن کامل

High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions

Accurately modeling the DNA sequence preferences of transcription factors (TFs), and using these models to predict in vivo genomic binding sites for TFs, are key pieces in deciphering the regulatory code. These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices (PSSMs), which may match large ...

متن کامل

A simple method for generating high-resolution maps of genome-wide protein binding

Chromatin immunoprecipitation (ChIP) and its derivatives are the main techniques used to determine transcription factor binding sites. However, conventional ChIP with sequencing (ChIP-seq) has problems with poor resolution, and newer techniques require significant experimental alterations and complex bioinformatics. Previously, we have used a new crosslinking ChIP-seq protocol (X-ChIP-seq) to p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2013