Cross document person name disambiguation using entity profiles

نویسندگان

  • Harish Srinivasan
  • John Chen
  • Rohini K. Srihari
چکیده

Consolidating entity information spread across multiple documents is a critical problem now with the growing use of large open-domain document sources. Associating every entity in a corpus to a unique entry in a growing knowledge base serves a dual purpose of consolidating (disambiguating) entities as well as to build a rich growing knowledge source containing information about each and every entity accumulated from several documents. With the presence of ambiguous names, use of nominals and aliases, the task of hyper-tagging an entity mentioned in a document to a node in a knowledge base requires the use of context in addition to name matching rules. In this paper, we present an approach that computes a similarity between entities identified in a document with those in the knowledge base using a Vector Space Model utilizing document level entity profiles Information accumulated for each entity from the entire document. The technique resulted in a TAC evaluation score of 71.9 at the TAC 2009 KBP track. The same technique was also successfully used in obtaining state of the art F-measures (93.95) in disambiguating person names by clustering the similarity values obtained using hierarchical agglomerative clustering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction

It is fairly common that different people are associated with the same name. In tracking person entities in a large document pool, it is important to determine whether multiple mentions of the same name across documents refer to the same entity or not. Previous approach to this problem involves measuring context similarity only based on co-occurring words. This paper presents a new algorithm us...

متن کامل

Who is Who and What is What: Experiments in Cross-Document Co-Reference

This paper describes a language-independent, scalable system for both challenges of crossdocument co-reference: name variation and entity disambiguation. We provide system results from the ACE 2008 evaluation in both English and Arabic. Our English system’s accuracy is 8.4% relative better than an exact match baseline (and 14.2% relative better over entities mentioned in more than one document)...

متن کامل

Person Name Disambiguation in Web Pages Using Social Network, Compound Words and Latent Topics

The World Wide Web (WWW) provides much information about persons, and in recent years WWW search engines have been commonly used for learning about persons. However, many persons have the same name and that ambiguity typically causes the search results of one person name to include Web pages about several different persons. We propose a novel framework for person name disambiguation that has th...

متن کامل

Automatic Annotation of Ambiguous Personal Names on the Web

Personal name disambiguation is an important task in social network extraction, evaluation and integration of ontologies, information retrieval, cross-document co-reference resolution and word sense disambiguation. We propose an unsupervised method to automatically annotate people with ambiguous names on the web using automatically extracted keywords. Given an ambiguous personal name, first, we...

متن کامل

Person Name Disambiguation on the Web Using Query Expansion

The more important the web search become, the bigger the same name problem in the web search. Proposed solution is forming clusters of people from search results. In this paper, we report our algorithms that disambiguates person names in web search results. Our clustering algorithm is based on hierarchical agglomerative clustering using named entities, compound key words and URLs as features fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009