Detecting Annotation Scheme Variation in Out-of-Domain Treebanks

نویسندگان

  • Yannick Versley
  • Julius Steen
چکیده

To ensure portability of NLP systems across multiple domains, existing treebanks are often extended by adding trees from interesting domains that were not part of the initial annotation effort. In this paper, we will argue that it is both useful from an application viewpoint and enlightening from a linguistic viewpoint to detect and reduce divergence in annotation schemes between extant and new parts in a set of treebanks that is to be used in evaluation experiments. The results of our correction and harmonization efforts will be made available to the public as a test suite for the evaluation of constituent parsing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

On Detecting Errors in Dependency Treebanks

Dependency relations between words are increasingly recognized as an important level of linguistic representation that is close to the data and at the same time to the semantic functor-argument structure as a target of syntactic analysis and processing. Correspondingly, dependency structures play an important role in parser evaluation and for the training and evaluation of tools based on depend...

متن کامل

Detecting Inconsistencies in Treebanks

Introduction Treebanks used as: • " gold standard " training and testing material for computational linguists • data for linguists to search through for theoretically relevant patterns Introduction Treebanks used as: • " gold standard " training and testing material for computational linguists • data for linguists to search through for theoretically relevant patterns Treebanks generally result ...

متن کامل

Ad Hoc Treebank Structures

We outline the problem of ad hoc rules in treebanks, rules used for specific constructions in one data set and unlikely to be used again. These include ungeneralizable rules, erroneous rules, rules for ungrammatical text, and rules which are not consistent with the rest of the annotation scheme. Based on a simple notion of rule equivalence and on the idea of finding rules unlike any others, we ...

متن کامل

From Phrase Structure to Dependencies, and Back

Transforming constituent-based annotation into dependency-based annotation has been shown to work for different treebanks and annotation schemes (e.g. Lin (1995) has transformed the Penn treebank, and Kübler and Telljohann (2002) the Tübinger Baumbank des Deutschen (TüBa-D/Z)). These ventures are usually triggered by the conflict between theory-neutral annotation, that targets most needs of a w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016