A Machine Learning Strategy to Identity Exonic Splice Enhancers in Human Protein-coding Sequence

نویسندگان

  • Thomas A. Down
  • Bernard Leong
  • Tim J.P. Hubbard
چکیده

Background: Exonic splice enhancers are sequences embedded within exons which promote and regulate the splicing of the transcript in which they are located. A class of exonic splice enhancers are the SR proteins, which are thought to mediate interactions between splicing factors bound to the 5’ and 3’ splice sites. Method and results: We present a novel strategy for analysing proteincoding sequence by first randomizing the codons used at each position within the coding sequence, then applying a motif-based machine learning algorithm to compare the true and randomized sequences. This strategy identified a collection of motifs which can successfully discriminate between real and randomized coding sequence, including – but not restricted to – several previously reported splice enhancer elements. As well as successfully distinguishing coding exons from randomized sequences, we show that our model is able to recognize noncoding exons. Conclusions: Our strategy succeeded in detecting signals in coding exons which seem to be orthogonal to the sequences’ primary function of coding for proteins. We believe that many of the motifs detected here may represent binding sites for previously unrecognized proteins which influence RNA splicing. We hope that this development will lead to improved knowledge of exonic splice enhancers, and new developments in the field of computational gene prediction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identical sequence patterns in the ends of exons and introns of human protein-coding genes

Intron splicing is one of the most important steps involved in the maturation process of a pre-mRNA. Although the sequence profiles around the splice sites have been studied extensively, the levels of sequence identity between the exonic sequences preceding the donor sites and the intronic sequences preceding the acceptor sites has not been examined as thoroughly. In this study we investigated ...

متن کامل

Distribution of SR protein exonic splicing enhancer motifs in human protein-coding genes

Exonic splicing enhancers (ESEs) are pre-mRNA cis-acting elements required for splice-site recognition. We previously developed a web-based program called ESEfinder that scores any sequence for the presence of ESE motifs recognized by the human SR proteins SF2/ASF, SRp40, SRp55 and SC35 (http://rulai.cshl.edu/tools/ESE/). Using ESEfinder, we have undertaken a large-scale analysis of ESE motif d...

متن کامل

Computational definition of sequence motifs governing constitutive exon splicing.

We have searched for sequence motifs that contribute to the recognition of human pre-mRNA splice sites by comparing the frequency of 8-mers in internal noncoding exons versus unspliced pseudo exons and 5' untranslated regions (5' untranslated regions [UTRs]) of transcripts of intronless genes. This type of comparison avoids the isolation of sequences that are distinguished by their protein-codi...

متن کامل

Ltter Disentangling Sources of Selection on Exonic Transcriptional Enhancers

In addition to coding for proteins, exons can also impact transcription by encoding regulatory elements such as enhancers. It has been debated whether such features confer heightened selective constraint, or evolve neutrally. We have addressed this question by developing a new approach to disentangle the sources of selection acting on exonic enhancers, in which we model the evolutionary rates o...

متن کامل

Coding exons function as tissue-specific enhancers of nearby genes.

Enhancers are essential gene regulatory elements whose alteration can lead to morphological differences between species, developmental abnormalities, and human disease. Current strategies to identify enhancers focus primarily on noncoding sequences and tend to exclude protein coding sequences. Here, we analyzed 25 available ChIP-seq data sets that identify enhancers in an unbiased manner (H3K4m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004