Parser Evaluation and the BNC: Evaluating 4 constituency parsers with 3 metrics
نویسندگان
چکیده
We evaluate discriminative parse reranking and parser self-training on a new English test set using four versions of the Charniak parser and a variety of parser evaluation metrics. The new test set consists of 1,000 hand-corrected British National Corpus parse trees. We directly evaluate parser output using both the Parseval and the Leaf Ancestor metrics. We also convert the hand-corrected and parser output phrase structure trees to dependency trees using a state-of-the-art functional tag labeller and constituent-to-dependency conversion tool, and then calculate label accuracy, unlabelled attachment and labelled attachment scores over the dependency structures. We find that reranking leads to a performance improvement on the new test set (albeit a modest one). We find that self-training using BNC data leads to significantly better results. However, it is not clear how effective self-training is when the training material comes from the North American News Corpus.
منابع مشابه
Cross-Framework Evaluation for Statistical Parsing
A serious bottleneck of comparative parser evaluation is the fact that different parsers subscribe to different formal frameworks and theoretical assumptions. Converting outputs from one framework to another is less than optimal as it easily introduces noise into the process. Here we present a principled protocol for evaluating parsing results across frameworks based on function trees, tree gen...
متن کاملIJCAI - 95 A Dependency - based Method for Evaluating Broad - Coverage
With the emergence of broad-coverage parsers, quantitative evaluation of parsers becomes increasingly more important. We propose a dependency-based method for evaluating broad-coverage parsers. The method ooers several advantages over previous methods that are based on phrase boundaries. The error count score we propose here is not only more intuitively meaningful than other scores, but also mo...
متن کاملParser Evaluation Using Elementary Dependency Matching
We present a perspective on parser evaluation in a context where the goal of parsing is to extract meaning from a sentence. Using this perspective, we show why current parser evaluation metrics are not suitable for evaluating parsers that produce logical-form semantics and present an evaluation metric that is suitable, analysing some of the characteristics of this new metric.
متن کاملParsing Any Domain English text to CoNLL dependencies
It is well known that accuracies of statistical parsers trained over Penn treebank on test sets drawn from the same corpus tend to be overestimates of their actual parsing performance. This gives rise to the need for evaluation of parsing performance on corpora from different domains. Evaluating multiple parsers on test sets from different domains can give a detailed picture about the relative ...
متن کاملA Multi-Teraflop Constituency Parser using GPUs
Constituency parsing with rich grammars remains a computational challenge. Graphics Processing Units (GPUs) have previously been used to accelerate CKY chart evaluation, but gains over CPU parsers were modest. In this paper, we describe a collection of new techniques that enable chart evaluation at close to the GPU’s practical maximum speed (a Teraflop), or around a half-trillion rule evaluatio...
متن کامل