On the origin of the periodicity of three in protein coding DNA sequences.
نویسندگان
چکیده
Since the early observation by Shepherd (1981a, b) it is well known that most mature messenger RNAs have a rhythm with a period of three bases. The phenomenon has been associated to some statistical properties of protein-coding DNA, such as the distribution of purines (R) and pyrimidines (Y), and the predominance of RNY codons (N = any base) (Shepherd, 1981a, b; Arques & Michel, 1992), codon usage bias (Rowe & Trainor, 1983) and abundance of codons with the same base at first or second codon position (Tsonis et al., 1991). Constraints related to mechanism that monitors the translation frame have also been invoked (Trifonov, 1987; Lagunez-Otero & Trifonov, 1992). the height of the peak at period three can be seen. In other words, the peak for base G (which is rather scarce at codon position II) is the highest due to the strong compositional differences at the three codon sites, while the peak for base C, which is more evenly distributed at the three codon sites, is the lowest. The abundance of base A at codon position II is only a secondary factor as will be shown below. In this letter, we would like to recall that the periodicity of three in protein-coding DNA sequences appears primarily as a consequence of the non-uniform distribution of base composition at the codon sites, a well known phenomenon which has allowed the design of algorithms to detect protein coding regions in a DNA sequence (see Staden, 1990 for a review). Such heterogeneity can be measured, for example, by the variance of each base by codon position. The second example concerns a computer generated random sequence (15 000 bases) whose compo-sitional features are given in Table l(b) and the spectra are plotted on Fig. l(b). This sequence has two bases (A and C) with the same variances and different frequencies, and two bases (G and T) with the same frequencies and different variances. Looking at the graphs, it can be seen that: (i) When the variances are equal (A and C), the highest peak corresponds to the base with higher frequency (C), and (ii) when the frequencies are equal (G and T), the highest peak corresponds to the base with greater variance (G). The secondary role of the relative frequencies of any base on the periodicity of three is demonstrated, since there is no correspondence between the order of any of the major base …
منابع مشابه
تخمین مکان نواحی کدکننده پروتئین در توالی عددی DNA با استفاده پنجره با طول متغیر بر مبنای منحنی سه بعدی Z
In recent years, estimation of protein-coding regions in numerical deoxyribonucleic acid (DNA) sequences using signal processing tools has been a challenging issue in bioinformatics, owing to their 3-base periodicity. Several digital signal processing (DSP) tools have been applied in order to Identify the task and concentrated on assigning numerical values to the symbolic DNA sequence, then app...
متن کاملPhylogenetic Analysis of Three Long Non-coding RNA Genes: AK082072, AK043754 and AK082467
Now, it is clear that protein is just one of the most functional products produced by the eukaryotic genome. Indeed, a major part of the human genome is transcribed to non-coding sequences than to the coding sequence of the protein. In this study, we selected three long non-coding RNAs namely AK082072, AK043754 and AK082467 which show brain expression and local region conservation among vertebr...
متن کاملPrediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence.
With the exponential growth of genomic sequences, there is an increasing demand to accurately identify protein coding regions (exons) from genomic sequences. Despite many progresses being made in the identification of protein coding regions by computational methods during the last two decades, the performances and efficiencies of the prediction methods still need to be improved. In addition, it...
متن کاملInvestigation of Solvent Effect on CUA Codon Mutation: NMR Shielding Study
P53 is one of the gene that has important role in human cell cycle and in the human cancers too.Models of codon substitution make it possible to separate mutational biases in the DNA fromselective constraints on the protein, and offer a great advantage over amino acid models forunderstanding the evolutionary process of proteins and protein-coding DNA sequences. In thiswork, we investigated abou...
متن کاملComputational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)
Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...
متن کاملIn silico cloning and bioinformatics study of Brucella melitensis Omp31 antigen in different mammalian expression vectors
Brucella melitensis, as a pathogenic gram-negative intracellular bacterium, causes brucellosis in animals and humans. According to literature, the B. melitensis outer membrane protein 31 (Omp31) is considered as an important vaccine candidate against brucellosis. The aim of the current study was to compare three different expression constructs containing B. melitensis Omp31 antigen using bioinf...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of theoretical biology
دوره 167 4 شماره
صفحات -
تاریخ انتشار 1994