Zipping out relevant information

نویسندگان

  • Dario Benedetto
  • Emanuele Caglioti
  • Vittorio Loreto
چکیده

represent as sequences of characters. Experimental investigations of physical processes, for instance, typically produce sequences or time series of data. Other systems, such as DNA and protein sequences or human language, are intrinsically represented as strings of characters. Treating information as sequences of characters helps make this information searchable—a necessary first step in navigating the overwhelming mass of data facing us today. Although the abundance of information and its accessibility represents an important cultural advance, it also introduces a new challenge: retrieving relevant information. Imagine entering the largest library in the world and seeking all the relevant documents on your favorite topic. Without an efficient librarian's help, this task would be difficult if not impossibly hopeless. The references you wanted likely would remain buried under tons of irrelevancies. On a more positive note, the growing body of available data provides an ideal test bed for theoretical constructions and models. This opportunity has stimulated considerable interest from researchers in many different communi-ties—physicists, mathematicians, economists , and statisticians, to name a few. In this spirit, we seek to discover the most suitable tools for examining large masses of data and extracting useful information from it. To accomplish the ambitious task of finding the proverbial needle in a haystack, we must first define what useful or relevant information is and where and how it is coded. This is a nontrivial problem because information means different things in different contexts. Moreover, it has no absolute value, depending instead on the specific filters observers impose on their data. Consider a simple coin-toss experiment. A gambler is probably only interested in the toss's outcome (heads or tails), but a physicist might be interested in whether the outcomes reveal anything about the coin's nature (such as whether it is honest or dishonest). We extract information via a two-step process. The syntactic step is where we first identify the structures present in messages without associating any specific meaning to them. It is only in the second (or semantic) step that comprehension of meaning occurs; it is the step in which we connect the syntactic information to previous experience and knowledge. As an example of this two-step process, consider how to identify the language in which a given text is written. In the first step, we scan through the text and identify syntactic structures: articles, verbs, adjectives, and so on. But only someone who knows the language can carry …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Theory on the Mechanism of DNA Renaturation: Stochastic Nucleation and Zipping

Renaturation of the complementary single strands of DNA is one of the important processes that requires better understanding in the view of molecular biology and biological physics. Here we develop a stochastic dynamical model on the DNA renaturation. According to our model there are at least three steps in the renaturation process viz. nonspecific-contact formation, correct-contact formation a...

متن کامل

Remodeling Tissue Interfaces and the Thermodynamics of Zipping during Dorsal Closure in Drosophila.

Dorsal closure during Drosophila embryogenesis is an important model system for investigating the biomechanics of morphogenesis. During closure, two flanks of lateral epidermis (with actomyosin-rich purse strings near each leading edge) close an eye-shaped opening that is filled with amnioserosa. At each canthus (corner of the eye) a zipping process remodels the tissue interfaces between the le...

متن کامل

Ab initio study of edge-smoothing, atom attraction and downward funneling in Ag/Ag(100)

The results of density-functional theory (DFT) calculations of the energy barriers for three lowbarrier relaxation processes in Ag/Ag(100) growth edge-zipping, atom-attraction and downward funneling (DF) are presented and compared with embedded atom method (EAM) calculations. In general, we find good agreement between the DFT values for these processes and the values assumed in recent simulatio...

متن کامل

Simpler Editing of Spatially-Connected Graph Hierarchies using Zipping Algorithms

Graph hierarchies, the data structures that result from hierarchically clustering the nodes of a graph, are widely used as a multi-scale way of representing data in many computing fields, including image segmentation, mesh simplification, hierarchical pathfinding and visualisation. As a result, significant efforts have been made over the years to find ways of constructing ‘good’ graph hierarchi...

متن کامل

Irreversible adsorption from dilute polymer solutions.

We study irreversible polymer adsorption from dilute solutions theoretically. Universal features of the resultant non-equilibrium layers are predicted. Two broad cases are considered, distinguished by the magnitude of the local monomer-surface sticking rate Q: chemisorption (very small Q) and physisorption (large Q). Early stages of layer formation entail single-chain adsorption. While single-c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computing in Science and Engineering

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2003