Analysis of Similarity/Dissimilarity of DNA Sequences Based on Chaos Game Representation

نویسندگان

  • Wei Deng
  • Yihui Luan
  • Yong Zhang
چکیده

and Applied Analysis 3 Table 1: The coding sequences of the first exon of β-globin gene of different species. Species Coding sequence ATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGT Human TACTGCCCTGTGGGGCAAGGTGAACGTGGATTAAG TTGGTGGTGAGGCCCTGGGCAG ATGCTGACTGCTGAGGAGAAGGCTGCCGTCACCGG Goat CTTCTGGGGCAAGGTGAAAGTGGATGAAGTTGGTG CTGAGGCCCTGGGCAG ATGGTGCACTTGACTTCTGAGGAGAAGAACTGCA Opossum TCACTACCATCTGGTCTAAGGTGCAGGTTGACCA GACTGGTGGTGAGGCCCTTGGCAG ATGGTGCACTGGACTGCTGAGGAGAAGCAGCTCAT Gallus CACCGGCCTCTGGGGGAAGGTCAATGTGGCCGAAT GTGGGGCCGAAGCCCTGGCCAG ATGACTTTGCTGAGTGCTGAGGAGAATGCTCATGT Lemur CACCTCTCTGTGGGGCAAGGTGGATGTAGAGAAAG TTGGTGGCGAGGCCTTGGGCAG ATGGTTGCACCTGACTGATGCTGAGAAGTCTGCTG Mouse TCTCTTGCCTGTGGGCAAAGGTGAACCCCGATGAA GTTGGTGGTGAGGCCCTGGGCAGG ATGGTGCATCTGTCCAGTGAGGAGAAGTCTGCGGT Rabbit CACTGCCCTGTGGGGCAAGGTGAATGTGGAAGAAG TTGGTGGTGAGGCCCTGGGC ATGGTGCACCTAACTGATGCTGAGAAGGCTACTGT Rat TAGTGGCCTGTGGGGAAAGGTGAACCCTGATAATG TTGGCGCTGAGGCCCTGGGCAG ATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGT Gorilla TACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAG TTGGTGGTGAGGCCCTGGGCAGG Table 2: Hurst exponent of the CGR-walk sequence {X n } of the nine species in Table 1. Human Goat Opossum Gallus Lemur Mouse Rabbit Rat Gorilla H(XRY n ) 0.445 0.5024 0.6536 0.5075 0.5016 0.538 0.429 0.5791 0.4698 H(XMK n ) 0.7452 0.7853 0.6547 0.7212 0.7487 0.7094 0.8099 0.5237 0.7467 H(XWS n ) 0.641 0.6894 0.6292 0.5756 0.6753 0.8118 0.615 0.7255 0.6302 3. Numerical Characterization of DNA Sequences Researchers from computer science and mathematics have been attracted to study the comparison of DNA sequences. As pointed out in references [13, 16–28], some related work has made progress. Now, we may represent a DNA sequence by a random numerical sequence based on CGR-walk technique. Gao and Xu [29] also substantially corroborated the results that longrange correlations are uncovered remarkably in the data. In this paper, we explore the tendency of a series of data by calculating the hurst exponent [30]. And some work has been done to study the relation between long-range correlation and hurst exponent [31]. In order to numerically characterize a DNA sequence given by the CGR, we treat the hurst exponent as the efficient invariant that is sensitive to this kind of graphical representation. Because a DNA sequence can be regarded as an ordered set of alphabet N = (A, C, G, T), we represent a DNA sequence as a finite set with N elements, denoted as [i] := {1, 2, . . . , N}. For any time series {u i } i=1 , one candefine several quantities as follows [30]: (i) the partial mean

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A probabilistic measure for alignment-free sequence comparison

MOTIVATION Alignment-free sequence comparison methods are still in the early stages of development compared to those of alignment-based sequence analysis. In this paper, we introduce a probabilistic measure of similarity between two biological sequences without alignment. The method is based on the concept of comparing the similarity/dissimilarity between two constructed Markov models. RESULT...

متن کامل

Analysis of similarity/dissimilarity of DNA sequences based on adjacent nucleotide pair representation

Introduction of graphic representation for nucleotide or protein sequences can provide intuitive overall pictures as well as useful insights for performing large-scale similarity analysis. In this paper, we are analyzing the similarity/dissimilarity of the mitochondrial genome sequences from twenty four mammal species. The analysis is important in finding the relatedness among the species and e...

متن کامل

Self-Similarity Limits of Genomic Signatures

It is shown that metric representation of DNA sequences is one-to-one. By using the metric representation method, suppression of nucleotide strings in the DNA sequences is determined. For a DNA sequence, an optimal string length to display genomic signature in chaos game representation is obtained by eliminating effects of the finite sequence. The optimal string length is further shown as a sel...

متن کامل

A novel method to reconstruct phylogeny tree based on thechaos game representation

We developed a new approach for the reconstruction of phylogeny trees based on the chaos game representation (CGR) of biological sequences. The chaos game representation (CGR) method generates a picture from a biological sequence, which displays both local and global patterns. The quantitative index of the biological sequence is extracted from the picture. The Kullback-Leibler discrimination in...

متن کامل

Encoding DNA sequences by integer chaos game representation

Motivation: DNA sequences are fundamental for encoding genetic information. The genetic information may be understood not only by symbolic sequences but also from the hidden signals inside the sequences. The symbolic sequences need to be transformed into numerical sequences so the hidden signals can be revealed by signal processing techniques. All current transformation methods encode DNA seque...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014