Keyphrase Extraction in Scientific Publications
نویسندگان
چکیده
We present a keyphrase extraction algorithm for scientific publications. Different from previous work, we introduce features that capture the positions of phrases in document with respect to logical sections found in scientific discourse. We also introduce features that capture salient morphological phenomena found in scientific keyphrases, such as whether a candidate keyphrase is an acronyms or uses specific terminologically productive suffixes. We have implemented these features on top of a baseline feature set used by Kea [1]. In our evaluation using a corpus of 120 scientific publications multiply annotated for keyphrases, our system significantly outperformed Kea at the p < .05 level. As we know of no other existing multiply annotated keyphrase document collections, we have also made our evaluation corpus publicly available. We hope that this contribution will spur future comparative research.
منابع مشابه
WING-NUS at SemEval-2017 Task 10: Keyphrase Identification and Classification as Joint Sequence Labeling
We describe an end-to-end pipeline processing approach for SemEval 2017’s Task 10 to extract keyphrases and their relations from scientific publications. We jointly identify and classify keyphrases by modeling the subtasks as sequential labeling. Our system utilizes standard, surface-level features along with the adjacent word features, and performs conditional decoding on whole text to extract...
متن کاملSZTERGAK : Feature Engineering for Keyphrase Extraction
Automatically assigning keyphrases to documents has a great variety of applications. Here we focus on the keyphrase extraction of scientific publications and present a novel set of features for the supervised learning of keyphraseness. Although these features are intended for extracting keyphrases from scientific papers, because of their generality and robustness, they should have uses in other...
متن کاملPKU_ICL at SemEval-2017 Task 10: Keyphrase Extraction with Model Ensemble and External Knowledge
This paper presents a system that participated in SemEval 2017 Task 10 (subtask A and subtask B): Extracting Keyphrases and Relations from Scientific Publications (Augenstein et al., 2017). Our proposed approach utilizes external knowledge to enrich feature representation of candidate keyphrase, includingWikipedia, IEEE taxonomy and pre-trained word embeddings etc. Ensemble of unsupervised mode...
متن کاملHow Document Pre-processing affects Keyphrase Extraction Performance
The SemEval-2010 benchmark dataset has brought renewed attention to the task of automatic keyphrase extraction. This dataset is made up of scientific articles that were automatically converted from PDF format to plain text and thus require careful preprocessing so that irrevelant spans of text do not negatively affect keyphrase extraction performance. In previous work, a wide range of document ...
متن کاملAccurate Keyphrase Extraction from Scientific Papers by Mining Linguistic Information
In this paper we investigate the impact of candidate terms filtering using linguistic information on the accuracy of automatic keyphrase extraction from scientific papers. According to linguistic knowledge, the noun phrases are most likely to be keyphrases. However the definition of a noun phrase can vary from a system to another. We have identified five POS tag sequence definitions of a noun p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007