Unsupervised Learning of an IS-A Taxonomy from a Limited Domain-Specific Corpus

نویسندگان

  • Daniele Alfarone
  • Jesse Davis
چکیده

This report addresses the problem of learning a taxonomy from a given domain-specific text corpus. We propose a novel unsupervised algorithm for this problem. Its key contributions include a clustering-based inference approach that increases recall over surface patterns and a graph-based algorithm for detecting incorrect edges that improves precision. Our system induces the taxonomy simply by analyzing the provided corpus. Thus, the learned taxonomy is focused on the concepts that are relevant for the specific corpus. An empirical evaluation on five corpora demonstrates the utility of the system. CR Subject Classification : I.2.6, I.2.7 Unsupervised learning of an IS-A taxonomy from a limited domain-specific corpus Daniele Alfarone and Jesse Davis Department of Computer Science, KU Leuven Celestijnenlaan 200A box 2402, 3001 Leuven, Belgium {daniele.alfarone,jesse.davis}@cs.kuleuven.be

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning

Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...

متن کامل

An Unsupervised Learning Method for an Attacker Agent in Robot Soccer Competitions Based on the Kohonen Neural Network

RoboCup competition as a great test-bed, has turned to a worldwide popular domains in recent years. The main object of such competitions is to deal with complex behavior of systems whichconsist of multiple autonomous agents. The rich experience of human soccer player can be used as a valuable reference for a robot soccer player. However, because of the differences between real and simulated soc...

متن کامل

Text-Based Ontology Enrichment Using Hierarchical Self-organizing Maps

The success of the Semantic Web research is dependent upon the construction of complete and reliable domain ontologies. In this paper we describe an unsupervised framework for domain ontology enrichment based on mining domain text corpora. Specifically, we enrich the hierarchical backbone of an existing ontology, i.e. its taxonomy, with new domain-specific concepts. The framework is based on an...

متن کامل

Unsupervised Ontology Enrichment with Hierarchical Self-Organizing Maps

The paper describes an unsupervised approach to domain ontology enrichment based on mining domain text corpora. Specifically, we enrich the hierarchical backbone of an existing ontology, i.e. its taxonomy, with new domain-specific knowledge. The approach and the corresponding framework are based on hierarchical self-organizing maps. As being founded on an unsupervised neural network architectur...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015