Mutual Information Measure for Distinguishing Coding and Non-Coding DNA Sequences
نویسندگان
چکیده
Several methodologies have been developed to identify genes and classify DNA sequences into coding and non-coding sequences. This classification process is fundamental in gene finding and gene annotation tools and is one of the most challenging tasks in bioinformatics and computational biology. The approach described herein measures mutual information (MIM) found in DNA sequences at the amino acid level, in order to discriminate coding from non-coding sequences. Results from cross validation tests on a bacterial genome data set demonstrate the accuracy of MIM in distinguishing coding from noncoding sequences.
منابع مشابه
MIM: A Species Independent Approach for Classifying Coding and Non-Coding DNA Sequences in Bacterial and Archaeal Genomes
A number of competing methodologies have been developed to identify genes and classify DNA sequences into coding and non-coding sequences. This classification process is fundamental in gene finding and gene annotation tools and is one of the most challenging tasks in bioinformatics and computational biology. An information theory measure based on mutual information has shown good accuracy in cl...
متن کاملInvestigation of Polymorphisms in Non-Coding Region of Human Mitochondrial DNA in 31 Iranian Hypertrophic Cardiomyopathy (HCM) Patients
The D-loop region is a hot spot for mitochondrial DNA (mtDNA) alterations, containing two hypervariable segments, HVS-I and HVS-II. In order to identify polymorphic sites and potential genetic background accounting for Hypertrophic CardioMyopathy (HCM) disease, the complete non-coding region of mtDNA from 31 unrelated HCM patients and 45 normal controls were sequenced. The sequences were aligne...
متن کاملP87: The Role of the Long Non-Coding RNA Sequences (LncRNAs) in Neurological Disorders
Precise interpretation of the transcriptome sequences in the several species showed that the major part of genome has been transcribed; however, just a few amounts of the transcription sequences have open-reading frames which are conversed during the evolution. So, it is unlikely that many of the transcribed sequences code the proteins. Among the all human non-coding transcripts, at least 10000...
متن کاملPhylogenetic Analysis of Three Long Non-coding RNA Genes: AK082072, AK043754 and AK082467
Now, it is clear that protein is just one of the most functional products produced by the eukaryotic genome. Indeed, a major part of the human genome is transcribed to non-coding sequences than to the coding sequence of the protein. In this study, we selected three long non-coding RNAs namely AK082072, AK043754 and AK082467 which show brain expression and local region conservation among vertebr...
متن کاملLong non-coding RNAs and their significance in human diseases
Protein-coding genes account for only a small fraction of the human genome and most of the genomic sequences are transcriptionally silent, but recent observations indicate significant functional elements, including non-coding protein transcripts in the human genome. Long non-coding RNAs (lncRNAs) have been defined as transcripts of >200 nucleotides without protein-coding capacity that perform t...
متن کامل