Sequence landscapes

نویسندگان

  • B. Clift
  • David Haussler
  • Ross M. McConnell
  • Thomas D. Schneider
  • Gary D. Stormo
چکیده

We describe a method for representing the structure of repeating sequences in nucleic-acids, proteins and other texts. A portion of the sequence is presented at the bottom of a CRT screen. Above the sequence is its landscape, which looks like a mountain range. Each mountain corresponds to a subsequence of the sequence. At the peak of every mountain is written the number of times that the subsequence appears. A data structure called a DAWG, which can be built in time proportional to the length of the sequence, is used to construct the landscape. For the 40 thousand bases of bacteriophage T7, the DAWG can be built in 30 seconds. The time to display any portion of the landscape is less than a second. Using sequence landscapes, one can quickly locate significant repeats.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fitness Landscapes Arising from the Sequence-Structure Maps of Biopolymers

Fitness landscapes are an important concept in molecular evolution since evolutionary adaptation as well as in vitro selection of biomolecules can be viewed as a hill-climbing-like process. Global features of landscapes can be described by statistical measures such as correlation functions or the fraction of neutral (equally fit) neighbors. Simple spin-glass-like landscape models borrowed from ...

متن کامل

Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness.

Potts Hamiltonian models of protein sequence co-variation are statistical models constructed from the pair correlations observed in a multiple sequence alignment (MSA) of a protein family. These models are powerful because they capture higher order correlations induced by mutations evolving under constraints and help quantify the connections between protein sequence, structure, and function mai...

متن کامل

The Generation and Exploitation of Protein Mutability Landscapes for Enzyme Engineering

The increasing number of enzyme applications in chemical synthesis calls for new engineering methods to develop the biocatalysts of the future. An interesting concept in enzyme engineering is the generation of large-scale mutational data in order to chart protein mutability landscapes. These landscapes allow the important discrimination between beneficial mutations and those that are neutral or...

متن کامل

Correlation of fitness landscapes from three orthologous TIM barrels originates from sequence and structure constraints

Sequence divergence of orthologous proteins enables adaptation to environmental stresses and promotes evolution of novel functions. Limits on evolution imposed by constraints on sequence and structure were explored using a model TIM barrel protein, indole-3-glycerol phosphate synthase (IGPS). Fitness effects of point mutations in three phylogenetically divergent IGPS proteins during adaptation ...

متن کامل

ICOMOS – IFLA Principles Concerning Rural Landscapes as Heritage – Delhi 2017

Rural landscapes are a vital component of the heritage of humanity that contain a complex lattice of tangible and intangible cultural heritage and have a strong connection with their surrounding nature and environment. As a result, they can be named the rural landscapes. Since the living rural landscapes are one of the most common types of cultural landscapes in existence, their conservation is...

متن کامل

Correction: Realistic three dimensional fitness landscapes generated by Self Organizing Maps for the analysis of experimental HIV-1 evolution

Human Immunodeficiency Virus type 1 (HIV-1) because of high mutation rates, large population sizes, and rapid replication, exhibits complex evolutionary strategies. For the analysis of evolutionary processes, the graphical representation of fitness landscapes provides a significant advantage. The experimental determination of viral fitness remains, in general, difficult and consequently most pu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Nucleic acids research

دوره 14 1  شماره 

صفحات  -

تاریخ انتشار 1986