Dependency Annotation Choices: Assessing Theoretical and Practical Issues of Universal Dependencies
نویسندگان
چکیده
This article attempts to place dependency annotation options on a solid theoretical and applied footing. By verifying the validity of some basic choices of the current dependency reference framework, Universal Dependencies (UD), in a perspective of general annotation principles, we show how some choices can lead to inconsistencies and discontinuities, partly due to UD’s alternation between syntax and semantics. For some constructions, we propose better suited alternative structures with a clear-cut distinction of syntax and semantics. We propose a classification of conception-oriented, annotatororiented, and finally, treebank end-useroriented considerations to be used in the creation of new annotation schemes.
منابع مشابه
An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملA Dependency Treebank for Telugu
In this paper, we describe the annotation and development of Telugu treebank following the Universal Dependencies framework. We manually annotated 1328 sentences from a Telugu grammar textbook and the treebank is freely available from Universal Dependencies version 2.1.1 In this paper, we discuss some language specific annotation issues and decisions; and report preliminary experiments with POS...
متن کاملUniversal Dependencies for Croatian (that work for Serbian, too)
We introduce a new dependency treebank for Croatian within the Universal Dependencies framework. We construct it on top of the SETIMES.HR corpus, augmenting the resource by additional part-of-speech and dependency-syntactic annotation layers adherent to the framework guidelines. In this contribution, we outline the treebank design choices, and we use the resource to benchmark dependency parsing...
متن کاملAssessing the Annotation Consistency of the Universal Dependencies Corpora
A fundamental issue in annotation efforts is to ensure that the same phenomena within and across corpora are annotated consistently. To date, there has not been a clear and obvious way to ensure annotation consistency of dependency corpora. Here, we revisit the method of Boyd et al. (2008) to flag inconsistencies in dependency corpora, and evaluate it on three languages with varying degrees of ...
متن کاملUDLex: Towards Cross-language Subcategorization Lexicons
This paper introduces UDLex, a computational framework for the automatic extraction of argument structures for several languages. By exploiting the versatility of the Universal Dependency annotation scheme, our system acquires subcategorization frames directly from a dependency parsed corpus, regardless of the input language. It thus uses a universal set of language-independent rules to detect ...
متن کامل