Extraction of Hierarchies Based on Inclusion of Co-occurring Words with Frequency Information

نویسندگان

  • Eiko Yamamoto
  • Kyoko Kanzaki
  • Hitoshi Isahara
چکیده

In this paper, we propose a method of automatically extracting word hierarchies based on the inclusion relations of word appearance patterns in corpora. We applied the complementary similarity measure (CSM) to determine a hierarchical structure of word meanings. The CSM is a similarity measure developed for recognizing degraded machine-printed text. There are CSMs for both binary and gray-scale images. The CSM for binary images has been applied to estimate one-to-many relations, such as superordinate-subordinate relations, and to extract word hierarchies. However, the CSM for gray-scale images has not been applied to natural language processing. Here, we apply the latter to extract word hierarchies from corpora. To do this, we used frequency information for co-occurring words, which is not considered when using the CSM for binary images. We compared our hierarchies with those obtained using the CSM for binary images, and evaluated them by measuring their degree of agreement with the EDR electronic dictionary.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction

It is fairly common that different people are associated with the same name. In tracking person entities in a large document pool, it is important to determine whether multiple mentions of the same name across documents refer to the same entity or not. Previous approach to this problem involves measuring context similarity only based on co-occurring words. This paper presents a new algorithm us...

متن کامل

A review on EEG based brain computer interface systems feature extraction methods

The brain – computer interface (BCI) provides a communicational channel between human and machine. Most of these systems are based on brain activities. Brain Computer-Interfacing is a methodology that provides a way for communication with the outside environment using the brain thoughts. The success of this methodology depends on the selection of methods to process the brain signals in each pha...

متن کامل

A review on EEG based brain computer interface systems feature extraction methods

The brain – computer interface (BCI) provides a communicational channel between human and machine. Most of these systems are based on brain activities. Brain Computer-Interfacing is a methodology that provides a way for communication with the outside environment using the brain thoughts. The success of this methodology depends on the selection of methods to process the brain signals in each pha...

متن کامل

The Intellectual Structure of Knowledge in the Field of Distance Education Using the Co-Word analyses

Background: Co- word analysis is one of the content analysis methods used in scientometric studies and mapping the scientific structure of various fields. The purpose of the present research is to map the structure of distance education using the co-word analysis. Methods: The research method is content analysis using co- word analysis. The research population are 31607 documents indexed in the...

متن کامل

A Systematic Review of Scientific Products Indexed at the Scopus Database in the Field of Post-disaster Housing with a Focus on Architecture

 Post disaster temporary housing is one of the challenges of disaster preparedness in any country; because there is a basic need for sustainable, affordable and efficient temporary housing. This study aimed to evaluate the scientific products in post-disaster temporary housing, focusing on the field of architecture and using scientometric methods and content analysis, systematic review and co-o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005