Influence of preprocessing on dependency syntax annotation: speed and agreement
نویسنده
چکیده
When creating a new resource, preprocessing the source texts before annotation is both ubiquitous and obvious. How the preprocessing affects the annotation effort for various tasks is for the most part an open question, however. In this paper, we study the effects of preprocessing on the annotation of dependency corpora and how annotation speed varies as a function of the quality of three different parsers and compare with the speed obtained when starting from a least-processed baseline. We also present preliminary results concerning the effects on agreement based on a small subset of sentences that have been doubly-annotated.1
منابع مشابه
An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملA double-blind experiment on interannotator agreement: the case of dependency syntax and Finnish
Manually performed treebanking is an expensive effort compared with automatic annotation. In return, manual treebanking is generally believed to provide higherquality/value syntactic annotation than automatic methods. Unfortunately, there is little or no empirical evidence for or against this belief, though arguments have been voiced for the high degree of subjectivity in other levels of lingui...
متن کاملA Framework for (Under)specifying Dependency Syntax without Overloading Annotators
We introduce a framework for lightweight dependency syntax annotation. Our formalism builds upon the typical representation for unlabeled dependencies, permitting a simple notation and annotation workflow. Moreover, the formalism encourages annotators to underspecify parts of the syntax if doing so would streamline the annotation process. We demonstrate the efficacy of this annotation on three ...
متن کاملZOMBILINGO: eating heads to perform dependency syntax annotation (ZOMBILINGO : manger des têtes pour annoter en syntaxe de dépendances) [in French]
This paper presents ZOMBILINGO, a Game With A Purpose (GWAP) that allows for the dependency syntax annotation of French corpora. The created resource is freely available on the game Web site. Mots-clés : jeux ayant un but, complexité, annotation, syntaxe en dépendances.
متن کاملStatistical syntactic parsing for Latvian
Syntactic parsing is an important technique in the natural language processing, yet Latvian is still lacking an efficient general coverage syntax parser. This paper reports on the first experiments on statistical syntactic parsing for Latvian — a highly inflective Indo-European language with a relatively free word order. We have induced a statistical parser from a small, non-balanced Latvian Tr...
متن کامل