Cross document person name disambiguation using entity profiles
نویسندگان
چکیده
Consolidating entity information spread across multiple documents is a critical problem now with the growing use of large open-domain document sources. Associating every entity in a corpus to a unique entry in a growing knowledge base serves a dual purpose of consolidating (disambiguating) entities as well as to build a rich growing knowledge source containing information about each and every entity accumulated from several documents. With the presence of ambiguous names, use of nominals and aliases, the task of hyper-tagging an entity mentioned in a document to a node in a knowledge base requires the use of context in addition to name matching rules. In this paper, we present an approach that computes a similarity between entities identified in a document with those in the knowledge base using a Vector Space Model utilizing document level entity profiles Information accumulated for each entity from the entire document. The technique resulted in a TAC evaluation score of 71.9 at the TAC 2009 KBP track. The same technique was also successfully used in obtaining state of the art F-measures (93.95) in disambiguating person names by clustering the similarity values obtained using hierarchical agglomerative clustering.
منابع مشابه
Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction
It is fairly common that different people are associated with the same name. In tracking person entities in a large document pool, it is important to determine whether multiple mentions of the same name across documents refer to the same entity or not. Previous approach to this problem involves measuring context similarity only based on co-occurring words. This paper presents a new algorithm us...
متن کاملWho is Who and What is What: Experiments in Cross-Document Co-Reference
This paper describes a language-independent, scalable system for both challenges of crossdocument co-reference: name variation and entity disambiguation. We provide system results from the ACE 2008 evaluation in both English and Arabic. Our English system’s accuracy is 8.4% relative better than an exact match baseline (and 14.2% relative better over entities mentioned in more than one document)...
متن کاملPerson Name Disambiguation in Web Pages Using Social Network, Compound Words and Latent Topics
The World Wide Web (WWW) provides much information about persons, and in recent years WWW search engines have been commonly used for learning about persons. However, many persons have the same name and that ambiguity typically causes the search results of one person name to include Web pages about several different persons. We propose a novel framework for person name disambiguation that has th...
متن کاملAutomatic Annotation of Ambiguous Personal Names on the Web
Personal name disambiguation is an important task in social network extraction, evaluation and integration of ontologies, information retrieval, cross-document co-reference resolution and word sense disambiguation. We propose an unsupervised method to automatically annotate people with ambiguous names on the web using automatically extracted keywords. Given an ambiguous personal name, first, we...
متن کاملPerson Name Disambiguation on the Web Using Query Expansion
The more important the web search become, the bigger the same name problem in the web search. Proposed solution is forming clusters of people from search results. In this paper, we report our algorithms that disambiguates person names in web search results. Our clustering algorithm is based on hierarchical agglomerative clustering using named entities, compound key words and URLs as features fo...
متن کامل