A Gold Standard Dependency Corpus for English

نویسندگان

  • Natalia Silveira
  • Timothy Dozat
  • Marie-Catherine de Marneffe
  • Samuel R. Bowman
  • Miriam Connor
  • John Bauer
  • Christopher D. Manning
چکیده

We present a gold standard annotation of syntactic dependencies in the English Web Treebank corpus using the Stanford Dependencies standard. This resource addresses the lack of a gold standard dependency treebank for English, as well as the limited availability of gold standard syntactic annotations for informal genres of English text. We also present experiments on the use of this resource, both for training dependency parsers and for evaluating dependency parsers like the one included as part of the Stanford Parser. We show that training a dependency parser on a mix of newswire and web data improves performance on that type of data without greatly hurting performance on newswire text, and therefore gold standard annotations for non-canonical text can be valuable for parsing in general. Furthermore, the systematic annotation effort has informed both the SD formalism and its implementation in the Stanford Parser’s dependency converter. In response to the challenges encountered by annotators in the EWT corpus, we revised and extended the Stanford Dependencies standard, and improved the Stanford Parser’s dependency converter.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-accuracy Annotation and Parsing of CHILDES Transcripts

Corpora of child language are essential for psycholinguistic research. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe an ongoing project that aims to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. To d...

متن کامل

DCU 250 Arabic Dependency Bank: An LFG Gold Standard Resource for the Arabic Penn Treebank

This paper describes the construction of a dependency bank gold standard for Arabic, DCU 250 Arabic Dependency Bank (DCU 250), based on the Arabic Penn Treebank Corpus (ATB) (Bies and Maamouri, 2003; Maamouri and Bies, 2004) within the theoretical framework of Lexical Functional Grammar (LFG). For parsing and automatically extracting grammatical and lexical resources from treebanks, it is neces...

متن کامل

Exploring self training for Hindi dependency parsing

In this paper we explore the effect of selftraining on Hindi dependency parsing. We consider a state-of-the-art Hindi dependency parser and apply self-training by using a large raw corpus. We consider two types of raw corpus, one from same domain as of training and testing data and the other from different domain. We also do an experiment, where we add small gold-standard data to the training s...

متن کامل

Morphosyntactic annotation of CHILDES transcripts.

Corpora of child language are essential for research in child language acquisition and psycholinguistics. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe a project whose goal is to annotate the English section of the CHILDES database with grammatical relations in the form of label...

متن کامل

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014