Multiple TreeBanks Integration for Chinese Phrase Structure Grammar Parsing Using Bagging
نویسندگان
چکیده
We describe our method of traditional Phrase Structure Grammar (PSG) parsing in CIPS-Bakeoff2012 Task3. First, bagging is proposed to enhance the baseline performance of PSG parsing. Then we suggest exploiting another TreeBank (CTB7.0) to improve the performance further. Experimental results on the development data set demonstrate that bagging can boost the baseline F1 score from 81.33% to 84.41%. After exploiting the data of CTB7.0, the F1 score reaches 85.03%. Our final results on the official test data set show that the baseline closed system using bagging gets the F1 score of 80.17%. It outperforms the best closed system by nearly 4% which uses a single model. After exploiting the CTB7.0 data, the F1 score reaches 81.16%, demonstrating further increases of about 1%.
منابع مشابه
Wide-Coverage Grammar Extraction from Thai Treebank
Parsing is an important step for natural language understanding, including phrase alignment for supporting statistical machine translation. Ability on analysing real text by parser strongly depends on grammar. Treebank could be one of the sources for grammar extraction. However, treebank construction largely relies on human annotators intuitions. Different intuitions from multiple annotators br...
متن کاملExploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing. Several types of transformation patterns (TP) are designed to capture the systematic annotation inconsistencies among different treebanks. Based on such TPs, we design quasisynchronous grammar features to augment the baseline parsing models. Our approach ca...
متن کاملTreebank-Based Probabilistic Phrase Structure Parsing
The area of probabilistic phrase structure parsing has been a central and active field in computational linguistics. Stochastic methods in natural language processing, in general, have become very popular as more and more resources become available. One of the main advantages of probabilistic parsing is in disambiguation: it is useful for a parsing system to return a ranked list of potential sy...
متن کاملData-driven, PCFG-based and Pseudo-PCFG-based Models for Chinese Dependency Parsing
We present a comparative study of transition-, graphand PCFG-based models aimed at illuminating more precisely the likely contribution of CFGs in improving Chinese dependency parsing accuracy, especially by combining heterogeneous models. Inspired by the impact of a constituency grammar on dependency parsing, we propose several strategies to acquire pseudo CFGs only from dependency annotations....
متن کاملVerifying context-sensitive treebanks and heuristic parses in polynomial time
A polyadic dynamic logic is introduced in which a model-theoretic version of nonlocal multicomponent tree-adjoining grammar can be formulated. It is shown to have a low polynomial time model checking procedure. This means that treebanks for nonlocal MCTAG, incl. all weaker extensions of TAG, can be efficiently corrected and queried. Our result is extended to HPSG treebanks (with some qualificat...
متن کامل