Information Measure for Long-Range Correlated Sequences: the Case of the 24 Human Chromosomes
نویسنده
چکیده
A new approach to estimate the Shannon entropy of a long-range correlated sequence is proposed. The entropy is written as the sum of two terms corresponding respectively to power-law (ordered) and exponentially (disordered) distributed blocks (clusters). The approach is illustrated on the 24 human chromosome sequences by taking the nucleotide composition as the relevant information to be encoded/decoded. Interestingly, the nucleotide composition of the ordered clusters is found, on the average, comparable to the one of the whole analyzed sequence, while that of the disordered clusters fluctuates. From the information theory standpoint, this means that the power-law correlated clusters carry the same information of the whole analysed sequence. Furthermore, the fluctuations of the nucleotide composition of the disordered clusters are linked to relevant biological properties, such as segmental duplications and gene density.
منابع مشابه
P87: The Role of the Long Non-Coding RNA Sequences (LncRNAs) in Neurological Disorders
Precise interpretation of the transcriptome sequences in the several species showed that the major part of genome has been transcribed; however, just a few amounts of the transcription sequences have open-reading frames which are conversed during the evolution. So, it is unlikely that many of the transcribed sequences code the proteins. Among the all human non-coding transcripts, at least 10000...
متن کاملUSE OF VECTORETTE AND SUBVECTORETTE PCR FOR THE ISOLATION OF TERMINAL SEQUENCES FROM Y EAST ARTIFICIAL CHROMOSOME (YAC) CLONES
Development of yeast artificial chromosome (Y AC) vectors, molecular cloning of large segments of chromosomal DNA, and their propagation in yeast cells has become feasible. Overlapping Y AC provides a route to the development of physical maps of entire mammalian chromosomes. A rapid method was developed to isolate and sequence termini of Y AC inserts quickly. The Y AC clone is digested wit...
متن کاملMultifractal information production of the human genome
We determine the Renyi entropies Kq of symbol sequences generated by human chromosomes. These exhibit nontrivial behaviour as a function of the scanning parameter q. In the thermodynamic formalism, there are phase transition-like phenomena close to the q = 1 region. We develop a theoretical model for this based on the superposition of two multifractal sets, which can be associated with the diff...
متن کاملRoles of Chromatin insulators in gene regulation and diseases
With advances in genetic science, the dynamic structure of eukaryotic genome is considered as basis of gene expression regulation. Long-distance communication between regulatory elements and target promoters is critical and the mechanisms responsible for this connection are just starting to emerge. Chromatin insulators are key determinants of proper gene regulation and precise organization of c...
متن کاملMutual information for examining correlations in DNA
This paper examines two methods for finding whether long-range correlations exist in DNA: a fractal measure and a mutual information technique. We evaluate the performance and implications of these methods in detail. In particular we explore their use comparing DNA sequences from a variety of sources. Using software for performing in silico mutations, we also consider evolutionary events leadin...
متن کامل