Notes on Lr Parser Design Christer Samuelsson 2 Lr Parsing
نویسنده
چکیده
1 INTRODUCTION This paper discusses the design of an LR parser for a speciic high-coverage English grammar. The design principles, though, are applicable to a large class of uniication-based grammars where the constraints are realized as Prolog terms and applied monotonically through instantiation, where there is no right movement , and where left movement is handled by gap threading. The LR parser was constructed for experiments on probabilistic parsing and speedup learning, see 10]. LR parsers are suitable for probabilistic parsing since they contain a representation of the current parsing state, namely the stack and the input string, and since the actions of the parsing tables are easily attributed probabilities conditional on this parsing state. LR parsers are suitable for the speedup learning application since the learned grammar is much larger than the original grammar, and the preexes of the learned rules overlap to a very high degree, circumstances that are far from ideal for the system's original parser. Even though these ends innuenced the design of the parser, this article does not focus on these applications but rather on the design and testing of the parser itself. An LR parser is a type of shift-reduce parser originally devised by Knuth for programming languages 4]. The success of LR parsing lies in handling a number of grammar rules simultaneously, rather than attempting one at a time, by the use of preex merging. LR parsing in general is well described in 1], and its application to natural-language processing in 12]. An LR parser is basically a pushdown automaton, i.e. it has a pushdown stack in addition to a nite set of internal states, and a reader head for scanning the input string from left to right, one symbol at a time. In fact, the \L" in \LR" stands for left-to-right scanning of the input string. The \R" stands for constructing the rightmost derivation in reverse. The stack is used in a characteristic way: The items on the stack consist of alternating grammar symbols and states. The current state is the state on top of the stack. The most distinguishing feature of an LR parser is however the form of the transition relation | the action and goto tables. A non-deterministic LR parser can in each step perform one of four basic actions. In state S with lookahead symbol Sym it can: 1. accept(S,Sym): Halt and signal success. 2. shift(S,Sym,S2): Consume the …
منابع مشابه
Notes On LR Parser Design
1 INTRODUCTION This paper discusses the design of an LR parser for a speciic high-coverage English grammar. The design principles, though, are applicable to a large class of uniication-based grammars where the constraints are realized as Prolog terms and applied monotonically through instantiation, where there is no right movement , and where left movement is handled by gap threading. The LR pa...
متن کاملExample-Based Optimization of Surface-Generation Tables
A method is given that \inverts" a logic grammar and displays it from the point of view of the logical form, rather than from that of the word string. LR-compiling techniques are used to allow a recursive-descent generation algorithm to perform \functor merging" much in the same way as an LR parser performs preex merging. This is an improvement on the semantic-head-driven generator that results...
متن کاملAn Eecient Algorithm for Surface Generation
A method is given that \inverts" a logic grammar and displays it from the point of view of the logical form, rather than from that of the word string. LR-compiling techniques are used to allow a recursive-descent generation algorithm to perform \functor merging" much in the same way as an LR parser performs preex merging. This is an improvement on the semantic-head-driven generator that results...
متن کاملParameterized LR Parsing
Common LR parser generators lack abstraction facilities for defining recurring patterns of productions. Although there are generators capable of supporting regular expressions on the right hand side of productions, no generator supports user defined patterns in grammars. Parameterized LR parsing extends standard LR parsing technology by admitting grammars with parameterized non-terminal symbols...
متن کاملThe Grammar Tool Box: A Case Study Comparing GLR Parsing Algorithms
The Grammar Tool Box is a toolset for manipulating Context Free Grammars and objects associated with them such as parsers, languages and derivations. GTB has three main rôles: as a pedagogic tool; as an experimental platform for novel algorithms and representations; and as a production tool for translator front end generation. In this paper we give an overview of GTB and its companion Javabased...
متن کامل