The Prediction of Human Exons By Oligonucleotide Composition and Disriminant Analysis of Spliceable Open Reading Frames

نویسندگان

  • Victor V. Solovyev
  • Asaf A. Salamov
  • Charles B. Lawrence
چکیده

Discriminant analysis is applied to the problem of recognition 5'-, internal and 3'-exons in human DNA sequences. Specific recognition functions were developed for revealing exons of particular types. The method based on a splice site prediction algorithm that uses the linear Fisher discriminant to combine the information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotides in protein coding and intron regions (Solovyev, Lawrence, 1994). The accuracy of our splice site recognition function is about 97%. A discriminant function for 5'-exon prediction includes hexanucleotide composition of upstream region, triplet composition around the ATG codon, ORF coding potential, donor splice site potential and composition of downstream intron region. For internal exon prediction, we combine in a discriminant function the characteristics describing the 5'-intron region, donor splice site, coding region, acceptor splice site and 3'-intron region for each open reading frame flanked by GT and AG base pairs. The accuracy of precise internal exon recognition on a test set of 451 exon and 246693 pseudoexon sequences is 77% with a specificity of 79% and a level of pseudoexon ORF prediction of 99.96%. The recognition quality computed at the level of individual nucleotides is 89% for exon sequences and 98% for intron sequences. A discriminant function for 3'-exon prediction includes octanucleotide composition of upstream intron region, triplet composition around the stop codon, ORF coding potential, acceptor splice site potential and hexanucleotide composition of downstream region.(ABSTRACT TRUNCATED AT 250 WORDS)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pombe: a gene-finding and exon-intron structure prediction system for fission yeast.

A special program developed by the authors, called Pombe, identifies protein coding regions in the Schizosaccharomyces pombe genome. Linear discriminant analysis was applied to predict 5'-terminal, internal, 3'-terminal exons (coding-exon) and introns. The accuracy of the prediction was tested by cross verifications. The sensitivity, specificity and correlation coefficient for the internal exon...

متن کامل

Comparative analysis of various gene finders specific to Caenorhabditis elegans genome

Computational gene prediction and identifying alternatively spliced isoforms have always been a challenging task. In this paper, we describe the performance of three gene/exon finding programmes namely Fex, Gen view2 and Gene builder capable of predicting open reading frames or exons for a given set of sequences from C. elegans genome. The predicted exons were compared with the 'sequencing cons...

متن کامل

The PCR Suite

The web application PCR Suite is an extension of the primer design program Primer3. It allows the design of primer sets encompassing single nucleotide polymorphisms, all exons of a single gene, all open reading frames in a list of cDNAs or the creation of overlapping PCR products.

متن کامل

Prediction of Structural Elements in Long Non-Coding RNAs using RNAz

In this paper we present an analysis of human long intergenic non-coding RNAs transcripts (8195 transcripts of hg19). The key problem of this kind of RNAs is that they do not have common statistically significant features in their primary sequence (e.g. open reading frames or codon bias). Therefore, the analysis was done by the tool RNAz which could solve this problem by employing comparative g...

متن کامل

Full-length and internally deleted forms of interleukin-7 are present in horse (Equus caballus) lymph node tissue.

Horse IL-7 (HIL-7) cDNA was isolated from adult lymph node tissue by reverse transcription polymerase chain reaction (RT-PCR) using oligonucleotide primers based on horse genomic sequences (The Broad Institute). In addition, to the full-length (FL) 531bp reading frame encoding 176 amino acids, shorter open-reading frames of 477, 396 and 264bp were also amplified. Nucleotide sequence analysis of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings. International Conference on Intelligent Systems for Molecular Biology

دوره 2  شماره 

صفحات  -

تاریخ انتشار 1994