Digital search trees and chaos game representation
نویسندگان
چکیده
Abstract. In this paper, we consider a possible representation of a DNA sequence in a quaternary tree, in which one can visualize repetitions of subwords (seen as suffixes of subsequences). The CGRtree turns a sequence of letters into a Digital Search Tree (DST), obtained from the suffixes of the reversed sequence. Several results are known concerning the height, the insertion depth for DST built from independent successive random sequences having the same distribution. Here the successive inserted words are strongly dependent. We give the asymptotic behaviour of the insertion depth and the length of branches for the CGR-tree obtained from the suffixes of a reversed i.i.d. or Markovian sequence. This behaviour turns out to be at first order the same one as in the case of independent words. As a by-product, asymptotic results on the length of longest runs in a Markovian sequence are obtained.
منابع مشابه
Probabilistic analysis of the asymmetric digital search trees
In this paper, by applying three functional operators the previous results on the (Poisson) variance of the external profile in digital search trees will be improved. We study the profile built over $n$ binary strings generated by a memoryless source with unequal probabilities of symbols and use a combinatorial approach for studying the Poissonized variance, since the probability distribution o...
متن کاملA novel method to reconstruct phylogeny tree based on thechaos game representation
We developed a new approach for the reconstruction of phylogeny trees based on the chaos game representation (CGR) of biological sequences. The chaos game representation (CGR) method generates a picture from a biological sequence, which displays both local and global patterns. The quantitative index of the biological sequence is extracted from the picture. The Kullback-Leibler discrimination in...
متن کاملAuthorship Attribution Using the Chaos Game Representation
The Chaos Game Representation, a method for creating images from nucleotide sequences, is modified to make images from chunks of text documents. Machine learning methods are then applied to train classifiers based on authorship. Experiments are conducted on several benchmark data sets in English, including the widely used Federalist Papers, and one in Portuguese. Validation results for the trai...
متن کاملComplex Morlet Wavelet Analysis of the DNA Frequency Chaos Game Signal and Revealing Specific Motifs of Introns in C.elegans
Nowadays, studying introns is becoming a very promising field in the genomics. Even though they play a role in the dynamic regulation of gene and in the organism's evolution, introns have not attracted enough attention like exons did; especially of digital signal processing researchers. Thus, we focus on analysis of the C.elegans introns. In this paper, we propose the complex Morlet wavelet ana...
متن کاملCHAOS EMBEDDED CHARGED SYSTEM SEARCH FOR PRACTICAL OPTIMIZATION PROBLEMS
Chaos is embedded to the he Charged System Search (CSS) to solve practical optimization problems. To improve the ability of global search, different chaotic maps are introduced and three chaotic-CSS methods are developed. A comparison of these variants and the standard CSS demonstrates the superiority and suitability of the selected variants for practical civil optimization problems.
متن کامل