Recognizing Biographical Sections in Wikipedia
نویسندگان
چکیده
Wikipedia is the largest collection of encyclopedic data ever written in the history of humanity. Thanks to its coverage and its availability in machine-readable format, it has become a primary resource for largescale research in historical and cultural studies. In this work, we focus on the subset of pages describing persons, and we investigate the task of recognizing biographical sections from them: given a person’s page, we identify the list of sections where information about her/his life is present. We model this as a sequence classification problem, and propose a supervised setting, in which the training data are acquired automatically. Besides, we show that six simple features extracted only from the section titles are very informative and yield good results well above a strong baseline.
منابع مشابه
Unsupervised Biographical Event Extraction Using Wikipedia Traffic
Biographical summarisation can provide succinct and meaningful answers to the question “Who is X?”. Current supervised summarisation approaches extract sentences from documents using features from textual context. In this paper, we explore a novel approach to biographical summarisation, by extracting important sentences from an entity’s Wikipedia page based on internet traffic to the page over ...
متن کاملAn Unsupervised Approach to Biography Production Using Wikipedia
We describe an unsupervised approach to multi-document sentence-extraction based summarization for the task of producing biographies. We utilize Wikipedia to automatically construct a corpus of biographical sentences and TDT4 to construct a corpus of non-biographical sentences. We build a biographical-sentence classifier from these corpora and an SVM regression model for sentence ordering from ...
متن کاملHidden revolution of human priorities: An analysis of biographical data from Wikipedia
An innovative study of Wikipedia biographical pages is presented. It is shown that the dates of some historical cataclysms may be reproduced from peculiarities of lifespan changes over time. Time dependence of number of biographical pages related to a year has a broken linear trend in logarithmic scale. It shows a sudden change of the slope from 0.0006 to 0.008 per year near 1700 AC. Presumably...
متن کاملBiographical Data Exploration as a Test-bed for a Multi-view, Multi-method Approach in the Digital Humanities
The present paper has two purposes: the main point is to report on the transfer and extension of an NLP-based biographical data exploration system that was developed for Wikipedia data and is now applied to a broader collection of traditional textual biographies from different sources and an additional set of structured biographical resources, also adding membership in political parties as a ne...
متن کاملThe Evolution of Genre in Wikipedia
This paper presents an overview of the ways in which genres, or structural forms, develop in a community of practice, in this case, Wikipedia. Firstly, we collected data by performing a small search task in the Wikipedia search engine (powered by Lucene) to locate articles related to global car manufacturers, for example, British Leyland, Ferrari and General Motors. We also searched for typical...
متن کامل