A text pattern-matching tool based on Parsing Expression Grammars
نویسنده
چکیده
Current text pattern-matching tools are based on regular expressions. However, pure regular expressions have proven too weak a formalism for the task: many interesting patterns either are difficult to describe or cannot be described by regular expressions. Moreover, the inherent nondeterminism of regular expressions does not fit the need to capture specific parts of a match. Motivated by these reasons, most scripting languages nowadays use pattern-matching tools that extend the original regular-expression formalism with a set of ad-hoc features, such as greedy repetitions, lazy repetitions, possessive repetitions, “longest match rule”, lookahead, etc. These ad-hoc extensions bring their own set of problems, such as lack of a formal foundation and complex implementations. In this paper, we propose the use of Parsing Expression Grammars (PEGs) as a basis for pattern matching. Following this proposal, we present LPEG, a pattern-matching tool based on PEGs for the Lua scripting language. LPEG unifies the ease of use of pattern-matching tools with the full expressive power of PEGs. Because of this expressive power, it can avoid the myriad of ad-hoc constructions present in several current pattern-matching tools. We also present a Parsing Machine that allows a small and efficient implementation of PEGs for pattern matching.
منابع مشابه
From Regexes to Parsing Expression Grammars
Most scripting languages nowadays use regex pattern-matching libraries. These regex libraries borrow the syntax of regular expressions, but have an informal semantics that is different from the semantics of regular expressions, removing the commutativity of alternation and adding ad-hoc extensions that cannot be expressed by formalisms for efficient recognition of regular languages, such as det...
متن کاملA Computational Lexicon Of Portuguese For Automatic Text Parsing
Using standard methods and formats established at LADL, and adopted by several European research teams to construct largecoverage electronic dictionaries and grammars, we elaborated for Portuguese a set of lexlcal resources, that were implemented in IN'rEX We describe the main features of such linguistic data, refer to their mmntenance and extension, and gwe different examples of automatic text...
متن کاملAn Efficient Pattern Matching Algorithm on a Subclass of Context Free Grammars
There is a close relationship between formal language theory and data compression. Since 1990’s various types of grammar-based text compression algorithms have been introduced. Given an input string, a grammar-based text compression algorithm constructs a context-free grammar that only generates the string. An interesting and challenging problem is pattern matching on context-free grammars P of...
متن کاملGrammar and Style Checking for German
As part of the MULTILINT project, a tool for grammar and style checking for Technical Documentation in German has been developed. The tool is based on a at pattern matching approach. This approach has theoretical limits compared to parsing checkers, but still reaches acceptable results on real life corpora. We give some insights into the strategies for rule implementation, focussing on the di e...
متن کاملRecognising and Generating Terms using Derivatives of Parsing Expression Grammars
Grammar-based sentence generation has been thoroughly explored for Context-Free Grammars (CFGs), but remains unsolved for recognition-based approaches such as Parsing Expression Grammars (PEGs). Lacking tool support, language designers using PEGs have difficulty predicting the behaviour of their parsers. In this paper, we extend the idea of derivatives, originally formulated for regular express...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Softw., Pract. Exper.
دوره 39 شماره
صفحات -
تاریخ انتشار 2009