Parallel Parsing: The Earley and Packrat Algorithms
نویسنده
چکیده
Parsing plays a critical role in our modern computer infrastructure: scripting languages such as Python and JavaScript, layout languages such as HTML, CSS, and Postscript/PDF, and data exchange languages such as XML and JSON are all interpreted, and so require parsing. Moreover, by some estimates, the time spent parsing while producing a rendered page from HTML, CSS, and JavaScript is as much as 40%. Motivated by the importance of parsing, and also by the wide-spread adoption of chip multiprocessors, we are interested in investigating the potential for parallel parsing algorithms. We begin with one classical and one more modern parsing algorithm, the Earley and the Packrat algorithm, respectively, and consider methods for parallelizing both. The general strategy is the same in both cases: break the string to be parsed into contiguous blocks, process each block in parallel, and finally re-join the blocks. We find that using this general framework, we are able to obtain a meaningful speedup relative to serial implementations of each algorithm. In particular, we obtain a speedup of 5.4 using the parallel Earley implementation on 8 processors, and a speedup of 2.5 using the parallel Packrat algorithm on 5 processors. Another algorithmic property that we consider is work efficiency, a measure of the amount of extra work done by the parallel algorithm relative to the serial algorithm. Poor work efficiency indicates that the parallel algorithm is doing considerably more work than is required, and thereby using more energy. With the relative importance of energy increasing, even a parallel algorithm that achieves significant speedup may not be worthwhile if it exhibits poor work efficiency. Though it is difficult to formulate a precise definition of work efficiency in terms of program operations, we qualitatively demonstrate that our algorithms, with some caveats, exhibit reasonable work efficiency.
منابع مشابه
A Parallel Extension of Earley’s Parsing Algorithm
Parsing is the process of deriving structure from a string, and can be used to describe the meaning of the string, and the relationships between its elements. This paper describes two popular parsing algorithms, CKY and Earley. This paper also discusses attempts others have made to distribute the processing workload of the CKY algorithm in a parallel environment. The paper then describes how I ...
متن کاملPackrat Parsing: Simple, Powerful, Lazy, Linear Time
Packrat parsing is a novel technique for implementing parsers in a lazy functional programming language. A packrat parser provides the power and flexibility of top-down parsing with backtracking and unlimited lookahead, but nevertheless guarantees linear parse time. Any language defined by an LL(k) or LR(k) grammar can be recognized by a packrat parser, in addition to many languages that conven...
متن کاملParallelizing the CKY and Earley Parsing Algorithms
Context-free parsing algorithms are one of the oldest and most well-understood aspects of natural language processing. Efforts to reduce the time complexity of these algorithms have produced two particularly popular algorithms: the Cocke-Kasami-Younger (CKY) bottomup parsing algorithm [5, 9], and the Earley top-down parsing algorithm [2, 3]. However, despite these efforts, parsing remains a tim...
متن کاملPackrat Parsing: A Literature Review
Packrat parsing is recently introduced technique based upon expression grammar. This parsing approach uses memoization and ensures a guarantee of linear parse time by avoiding redundant function calls by using memoization. This paper studies the progress made in packrat parsing till date and discusses the approaches to tackle this parsing process efficiently. In addition to this, other issues s...
متن کاملA Survey of Packrat Parser
Two recent developments in the field of formal languages are Parsing Expression Grammar (PEG) and packrat parsing. The PEG formalism is similar to BNF, but defines syntax in terms of recognizing strings, rather than constructing them. It is, in fact, precise specification of a backtracking recursive-descent parser. Packrat parsing is a general method to handle backtracking in recursive descent ...
متن کامل