An Approach to Web-Scale Named-Entity Disambiguation
نویسندگان
چکیده
We present a multi-pass clustering approach to large scale, wide-scope named-entity disambiguation (NED) on collections of web pages. Our approach uses name co-occurrence information to cluster and hence disambiguate entities, and is designed to handle NED on the entire web. We show that on web collections, NED becomes increasingly difficult as the corpus size increases, not only because of the challenge of scaling the NED algorithm, but also because new and surprising facets of entities become visible in the data. This effect limits the potential benefits for data-driven approaches of processing larger data-sets, and suggests that efficient clustering-based disambiguation methods for the web will require extracting more specialized information from documents.
منابع مشابه
AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data
Over the last decades, several billion Web pages have been made available on the Web. The ongoing transition from the current Web of unstructured data to the Web of Data yet requires scalable and accurate approaches for the extraction of structured data in RDF (Resource Description Framework) from these websites. One of the key steps towards extracting RDF from text is the disambiguation of nam...
متن کاملScenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
The rapidly increasing use of large-scale data on the Web has made named entity disambiguation a key research challenge in Information Extraction (IE) and development of the Semantic Web. In this paper we propose a novel disambiguation framework that utilizes background semantic information, typically in the form of Linked Data, to accurately determine the intended meaning of detected semantic ...
متن کاملLarge-Scale Named Entity Disambiguation Based on Wikipedia Data
This paper presents a large-scale system for the recognition and semantic disambiguation of named entities based on information extracted from a large encyclopedic collection and Web search results. It describes in detail the disambiguation paradigm employed and the information extraction process from Wikipedia. Through a process of maximizing the agreement between the contextual information ex...
متن کاملBenchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web
Named entity recognition and disambiguation are of primary importance for extracting information and for populating knowledge bases. Detecting and classifying named entities has traditionally been taken on by the natural language processing community, whilst linking of entities to external resources, such as those in DBpedia, has been tackled by the Semantic Web community. As these tasks are tr...
متن کاملTrading accuracy for faster entity linking
Named entity linking (NEL) can be applied to documents such as financial reports, web pages and news articles, but state of the art disambiguation techniques are currently too slow for web-scale applications because of a high complexity with respect to the number of candidates. In this paper, we accelerate NEL by taking two successful disambiguation features (popularity and context comparabilit...
متن کامل