Extraction of Folksonomies from Noisy Texts

نویسندگان

  • Wim De Smet
  • Marie-Francine Moens
چکیده

We built a system for the automatic creation of a text-based topic hierarchy, meant to be used in a geographically defined community. This poses two main problems. First, the appearance of both standard language and a community-related dialect, demanding that dialect words should be as much as possible corrected to standard words, and second, the automatic hierarchic clustering of texts by their topic. The problem of correcting dialect words is dealt with by performing a nearest neighbor search over a dynamic set of known words, using a set of transition rules from dialect to standard words, which are learned from a pair-wise lexicon. We tackle the clustering problem by implementing a hierarchical co-clustering algorithm that automatically generates a topic hierarchy of the collection and simultaneously groups documents and words into clusters.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Folksonomies versus Automatic Keyword Extraction: an Empirical Study

Semantic Metadata, which describes the meaning of documents, can be produced either manually or else semi-automatically using information extraction techniques. Manual techniques are expensive if they rely on skilled cataloguers, but a possible alternative is to make use of community produced annotations such as those collected in folksonomies. This paper reports on an experiment that we carrie...

متن کامل

Enabling Folksonomies for Knowledge Extraction: A Semantic Grounding Approach

Folksonomies emerge as the result of the free tagging activity of a large number of users over a variety of resources. They can be considered as valuable sources from which it is possible to obtain emergingvocabularies that can be leveraged in knowledge extraction tasks. However, when it comes to understanding the meaning of tags in folksonomies, several problems mainly related to the appearanc...

متن کامل

Towards Ontological Structures Extraction from Folksonomies: An Efficient Fuzzy Clustering Approach

Folksonomies are one of the technologies of Web 2.0 that permit users to annotate resources on the Web. In this paper, the authors propose an integrated approach to extract ontological structures from unstructured and semi-structured resources. Our proposal overcomes limitations of existing approaches. It gives a formal, simple, and efficient solution to the tag clustering and disambiguation pr...

متن کامل

Exploring the Value of Folksonomies for Creating Semantic Metadata

Finding good keywords to describe resources is an on-going problem. Typically, we select such words manually from a thesaurus of terms, or they are created using automatic keyword extraction techniques. Folksonomies are an increasingly well-populated source of unstructured tags describing Web resources. This article explores the value of the folksonomy tags as a potential source of keyword meta...

متن کامل

Definition Extraction Using a Sequential Combination of Baseline Grammars and Machine Learning Classifiers

The paper deals with the task of definition extraction from a small and noisy corpus of instructive texts. Three approaches are presented: Partial Parsing, Machine Learning and a sequential combination of both. We show that applying ML methods with the support of a trivial grammar gives results better than a relatively complicated partial grammar, and much better than pure ML approach.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007