A Fully Coreference-annotated Corpus of Scholarly Papers from the ACL Anthology
نویسندگان
چکیده
We describe a large coreference annotation task performed on a corpus of 266 papers from the ACL Anthology, a publicly, electronically available collection of scientific papers in the domain of computational linguistics and language technology. The annotation comprises mainly noun phrase coreference of the full textual content of each paper in the Anthology subset. It has been performed carefully and at least twice for each paper (initial annotation and secondary correction phase). The purpose of this paper is to summarize the comprehensive annotation schema and release the corpus publicly, along with this paper. The corpus is by far larger than the ACE coreference corpora. It can be used to train coreference resolution systems in the Computational Linguistics and Language Technology domain for semantic search, taxonomy extraction, question answering, citation analysis, scientific discourse analysis, etc.
منابع مشابه
The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics
The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Antho...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملSemantic Annotation of the ACL Anthology Corpus for the Automatic Analysis of Scientific Literature
This paper describes the process of creating a corpus annotated for concepts and semantic relations in the scientific domain. A part of the ACL Anthology Corpus was selected for annotation, but the annotation process itself is not specific to the computational linguistics domain and could be applied to any scientific corpus. Concepts were identified and annotated fully automatically, based on a...
متن کاملArgumentative analysis of the ACL Anthology (Analyse argumentative du corpus de l'ACL (ACL Anthology)) [in French]
This paper presents an application of Text Zoning to the ACL Anthology. Text Zoning is known to be useful to characterize the content of papers, especially in the scientific domain. We show that recent techniques based on weakly supervised learning obtain excellent results on the ACL Anthology. Although these kinds of techniques is known in the domain, it is the first time it is applied to the ...
متن کاملCorpus for Coreference Resolution on Scientific Papers
The ever-growing number of published scientific papers prompts the need for automatic knowledge extraction to help scientists keep up with the state-of-the-art in their respective fields. To construct a good knowledge extraction system, annotated corpora in the scientific domain are required to train machine learning models. As described in this paper, we have constructed an annotated corpus fo...
متن کامل