Information Measure for Long-Range Correlated Sequences: the Case of the 24 Human Chromosomes

نویسنده

  • A. Carbone
چکیده

A new approach to estimate the Shannon entropy of a long-range correlated sequence is proposed. The entropy is written as the sum of two terms corresponding respectively to power-law (ordered) and exponentially (disordered) distributed blocks (clusters). The approach is illustrated on the 24 human chromosome sequences by taking the nucleotide composition as the relevant information to be encoded/decoded. Interestingly, the nucleotide composition of the ordered clusters is found, on the average, comparable to the one of the whole analyzed sequence, while that of the disordered clusters fluctuates. From the information theory standpoint, this means that the power-law correlated clusters carry the same information of the whole analysed sequence. Furthermore, the fluctuations of the nucleotide composition of the disordered clusters are linked to relevant biological properties, such as segmental duplications and gene density.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

P87: The Role of the Long Non-Coding RNA Sequences (LncRNAs) in Neurological Disorders

Precise interpretation of the transcriptome sequences in the several species showed that the major part of genome has been transcribed; however, just a few amounts of the transcription sequences have open-reading frames which are conversed during the evolution. So, it is unlikely that many of the transcribed sequences code the proteins. Among the all human non-coding transcripts, at least 10000...

متن کامل

USE OF VECTORETTE AND SUBVECTORETTE PCR FOR THE ISOLATION OF TERMINAL SEQUENCES FROM Y EAST ARTIFICIAL CHROMOSOME (YAC) CLONES

Development of yeast artificial chromosome (Y AC) vectors, molecular cloning of large segments of chromosomal DNA, and their propagation in yeast cells has become feasible. Overlapping Y AC provides a route to the development of physical maps of entire mammalian chromosomes. A rapid method was developed to isolate and sequence termini of Y AC inserts quickly. The Y AC clone is digested wit...

متن کامل

Multifractal information production of the human genome

We determine the Renyi entropies Kq of symbol sequences generated by human chromosomes. These exhibit nontrivial behaviour as a function of the scanning parameter q. In the thermodynamic formalism, there are phase transition-like phenomena close to the q = 1 region. We develop a theoretical model for this based on the superposition of two multifractal sets, which can be associated with the diff...

متن کامل

Roles of Chromatin insulators in gene regulation and diseases

With advances in genetic science, the dynamic structure of eukaryotic genome is considered as basis of gene expression regulation. Long-distance communication between regulatory elements and target promoters is critical and the mechanisms responsible for this connection are just starting to emerge. Chromatin insulators are key determinants of proper gene regulation and precise organization of c...

متن کامل

Mutual information for examining correlations in DNA

This paper examines two methods for finding whether long-range correlations exist in DNA: a fractal measure and a mutual information technique. We evaluate the performance and implications of these methods in detail. In particular we explore their use comparing DNA sequences from a variety of sources. Using software for performing in silico mutations, we also consider evolutionary events leadin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2013