Practical Pattern Matching
نویسنده
چکیده
No new genes? At the University of Toronto, Brendan Frey is leading a group of scientists who are using AI techniques to analyze molecularbiology data. One of their projects involves using a factor graph they developed called GenRate to discover and evaluate genes in mouse tissues. Factor graphs let researchers describe a system with complex variables, such as gene location in DNA as well as gene length and function. “What a factor graph is useful for,” says Frey, “is describing a scoring function that tells you how good each setting of the variables is.” Using samples from over 1 million probes along DNA in 37 different mouse tissues, the scientists used their factor graph to determine which bits of DNA are expressed, or activated to read protein. In some tissues, the DNA is expressed; in others, it might not be. DNA parts that have no function are never activated. In the factor graph, each variable is a node. The scoring function comprises many local scoring functions that look for a small number of variables. For that small set of variables, it finds a score for each configuration of those variables. The local scores’ sum is the total score. “It’s a nice way to decompose a very complex problem into a whole bunch of simpler problems,” Frey says. The scientists then compare the factor graph data to known gene patterns. Because the factor graph provides a computational framework for vetting the best configuration of variables as well as discovering them, the team came up with surprising results that led to a major revision of the view of the mammalian genome. Although some research claims many genes are left to discover, Frey’s team has shown that might not be true. “Beyond the genes we found,” Frey says, “we don’t believe there exists many new protein-coding genes.”
منابع مشابه
A New Compression Method for Compressed Matching
A practical adaptive compression algorithm based on LZSS is presented, which is especially constructed to solve the compressed pattern matching problem, i.e., pattern matching directly in a compressed text without decompressing.
متن کاملTwo-phase Pattern Matching for Regular Expressions in Intrusion Detection Systems
Regular expressions are used to describe security threats’ signatures in network intrusion detection (NID) systems. To identify suspicious packets using regular expression matching, many NID systems use memory-based deterministic finite-state automata (DFA) with one-pass-scanning model, which is fast and allows dynamic updates. However, a number of practical signature patterns commonly found in...
متن کاملEPSRC Vacation Bursary A Practical Investigation Into Modern Pattern Matching Techniques
Over recent years, there have been many theoretical advances in approximate pattern matching. The aim of this project has been to consider how these advances perform in practice, with the general aim of comparing the methods against a näıve approach in order to determine at what input sizes they become practical. Approximate pattern matching considers searching areas of a text string for areas ...
متن کاملDiscovering Most Classificatory Patterns for Very Expressive Pattern Classes
The classificatory power of a pattern is measured by how well it separates two given sets of strings. This paper gives practical algorithms to find the fixed/variable-length-don’t-care pattern (FVLDC pattern) and approximate FVLDC pattern which are most classificatory for two given string sets. We also present algorithms to discover the best window-accumulated FVLDC pattern and window-accumulat...
متن کاملAlgebraic Pattern Matching in Join Calculus
We propose an extension of the join calculus with pattern matching on algebraic data types. Our initial motivation is twofold: to provide an intuitive semantics of the interaction between concurrency and pattern matching; to define a practical compilation scheme from extended join definitions into ordinary ones plus ML pattern matching. To assess the correctness of our compilation scheme, we de...
متن کاملar X iv : 0 80 2 . 40 18 v 1 [ cs . P L ] 2 7 Fe b 20 08 ALGEBRAIC PATTERN MATCHING IN JOIN CALCULUS
We propose an extension of the join calculus with pattern matching on algebraic data types. Our initial motivation is twofold: to provide an intuitive semantics of the interaction between concurrency and pattern matching; to define a practical compilation scheme from extended join definitions into ordinary ones plus ML pattern matching. To assess the correctness of our compilation scheme, we de...
متن کامل