Error Mining for Wide-Coverage Grammar Engineering
نویسنده
چکیده
Parsing systems which rely on hand-coded linguistic descriptions can only perform adequately in as far as these descriptions are correct and complete. The paper describes an error mining technique to discover problems in hand-coded linguistic descriptions for parsing such as grammars and lexicons. By analysing parse results for very large unannotated corpora, the technique discovers missing, incorrect or incomplete linguistic descriptions. The technique uses the frequency of n-grams of words for arbitrary values of n. It is shown how a new combination of suffix arrays and perfect hash finite automata allows an efficient implementation.
منابع مشابه
Correcting Syntactic Annotation Errors Based on Tree Mining
This paper provides a new method to correct annotation errors in a treebank. The previous error correction method constructs a pseudo parallel corpus where incorrect partial parse trees are paired with correct ones, and extracts error correction rules from the parallel corpus. By applying these rules to a treebank, the method corrects errors. However, this method does not achieve wide coverage ...
متن کاملAutomated Multiword Expression Prediction For Grammar Engineering
However large a hand-crafted widecoverage grammar is, there are always going to be words and constructions that are not included in it and are going to cause parse failure. Due to their heterogeneous and flexible nature, Multiword Expressions (MWEs) provide an endless source of parse failures. As the number of such expressions in a speaker’s lexicon is equiparable to the number of single word u...
متن کاملEngineering a Wide-Coverage Lexicalized Grammar
We discuss a number of practical issues that have arisen in the development of a wide-coverage lexicalized grammar for English. In particular, we consider the way in which the design of the grammar and of its encoding was influenced by issues relating to the size of the grammar.
متن کاملApplication of Satellite Data and Data Mining Algorithms in Estimating Coverage Percent (Case study: Nadoushan Rangelands, Ardakan Plain, Yazd, Iran)
Assessing and monitoring rangelands in arid regions are important and essential tasks in order to manage the desired regions. Nowadays, satellite images are used as an approximately economical and fast way to study the vegetation in a variety of scales. This research aims to estimate the coverage percent using the digital data given by ETM+ Landsat satellite. In late May and early Ju...
متن کاملRobust Parsing, Error Mining, Automated Lexical Acquisition, And Evaluation
In our attempts to construct a wide coverage HPSG parser for Dutch, techniques to improve the overall robustness of the parser are required at various steps in the parsing process. Straightforward but important aspects include the treatment of unknown words, and the treatment of input for which no full parse is available. Another important means to improve the parser's performance on unexpected...
متن کامل