Person Name Disambiguation on the Web by Two-Stage Clustering

نویسندگان

  • Masaki Ikeda
  • Shingo Ono
  • Issei Sato
  • Minoru Yoshida
  • Hiroshi Nakagawa
چکیده

The more important web searching becomes, the more we have to focus on the “same name” problem in web searches. In this paper, we report our algorithm for disambiguating person names in web search results. It is a document clustering algorithm based on hierarchical agglomerative clustering using named entities, compound keywords, and URLs as features for document similarity calculation. We propose a two-stage clustering algorithm to improve the low recall values, in which the clustering results of the first stage are used to extract features used in the second stage clustering. We participated in the WePS-2 evaluation with this algorithm. We explain the results and describe other experiments performed with the WePS-1 data sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Resonance Theory Based Two-Stage Chinese Name Disambiguation

It’s common that different individuals share the same name, which makes it time-consuming to search information of a particular individual on the web. Name disambiguation study is necessary to help users find the person of interest more readily. In this paper, we propose an Adaptive Resonance Theory (ART) based two-stage strategy for this problem. We get a first-stage clustering result with ART...

متن کامل

Person Name Disambiguation on the Web Using Query Expansion

The more important the web search become, the bigger the same name problem in the web search. Proposed solution is forming clusters of people from search results. In this paper, we report our algorithms that disambiguates person names in web search results. Our clustering algorithm is based on hierarchical agglomerative clustering using named entities, compound key words and URLs as features fo...

متن کامل

Using Web Graph Structure for Person Name Disambiguation

In the third edition of WePS campaign we have undertaken the person name disambiguation problem referred to as a clustering task. Our aim was to make use of intrinsic link relationships among Web pages for name resolution in Web search results. To date, link structure has not been used for this purpose. However, Web graph can be a rich source of information about latent semantic similarity betw...

متن کامل

Clustering web people search results using fuzzy ants

Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambiguous person name queries into meaningful clusters, with each cluster referring to one individual. This paper presents a fuzzy ant based clustering approach for this multi-document person name disambiguation problem. T...

متن کامل

Which Who are They? People Attribute Extraction and Disambiguation in Web Search Results∗

People name search often returns a lot of Web pages containing the strings of personal names. Due to namesake, extracting target person attributes (such as birthday, occupation, affiliation, nationality, contact information, etc.) is expected to be helpful to differentiate documents related to different people and thus group documents related to the same person. This paper presents the methodol...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009