A Fully Coreference-annotated Corpus of Scholarly Papers from the ACL Anthology

نویسندگان

  • Ulrich Schäfer
  • Christian Spurk
  • Jörg Steffen
چکیده

We describe a large coreference annotation task performed on a corpus of 266 papers from the ACL Anthology, a publicly, electronically available collection of scientific papers in the domain of computational linguistics and language technology. The annotation comprises mainly noun phrase coreference of the full textual content of each paper in the Anthology subset. It has been performed carefully and at least twice for each paper (initial annotation and secondary correction phase). The purpose of this paper is to summarize the comprehensive annotation schema and release the corpus publicly, along with this paper. The corpus is by far larger than the ACE coreference corpora. It can be used to train coreference resolution systems in the Computational Linguistics and Language Technology domain for semantic search, taxonomy extraction, question answering, citation analysis, scientific discourse analysis, etc.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics

The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Antho...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Semantic Annotation of the ACL Anthology Corpus for the Automatic Analysis of Scientific Literature

This paper describes the process of creating a corpus annotated for concepts and semantic relations in the scientific domain. A part of the ACL Anthology Corpus was selected for annotation, but the annotation process itself is not specific to the computational linguistics domain and could be applied to any scientific corpus. Concepts were identified and annotated fully automatically, based on a...

متن کامل

Argumentative analysis of the ACL Anthology (Analyse argumentative du corpus de l'ACL (ACL Anthology)) [in French]

This paper presents an application of Text Zoning to the ACL Anthology. Text Zoning is known to be useful to characterize the content of papers, especially in the scientific domain. We show that recent techniques based on weakly supervised learning obtain excellent results on the ACL Anthology. Although these kinds of techniques is known in the domain, it is the first time it is applied to the ...

متن کامل

Corpus for Coreference Resolution on Scientific Papers

The ever-growing number of published scientific papers prompts the need for automatic knowledge extraction to help scientists keep up with the state-of-the-art in their respective fields. To construct a good knowledge extraction system, annotated corpora in the scientific domain are required to train machine learning models. As described in this paper, we have constructed an annotated corpus fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012