Wikipedia as the Premiere Source for Targeted Hypernym Discovery
نویسندگان
چکیده
Targeted Hypernym Discovery (THD) applies lexico-syntactic (Hearst) patterns on a suitable corpus with the intent to extract one hypernym at a time. Using Wikipedia as the corpus in THD has recently yielded promising results in a number of tasks. We investigate the reasons that make Wikipedia articles such an easy target for lexicosyntactic patterns, and suggest that it is primarily the adherence of its contributors to Wikipedia’s Manual of Style. We propose the hypothesis that extractable patterns are more likely to appear in articles covering popular topics, since these receive more attention including the adherence to the rules from the manual. However, two preliminary experiments carried out with 131 and 100 Wikipedia articles do not support this hypothesis.
منابع مشابه
Entityclassifier.eu: Real-Time Classification of Entities in Text with Wikipedia
Targeted Hypernym Discovery (THD) performs unsupervised classification of entities appearing in text. A hypernym mined from the free-text of the Wikipedia article describing the entity is used as a class. The type as well as the entity are cross-linked with their representation in DBpedia, and enriched with additional types from DBpedia and YAGO knowledge bases providing a semantic web interope...
متن کاملUnsupervised Entity Classification with Wikipedia and Wordnet
The task of classifying entities appearing in textual annotations to an arbitrary set of classes has not been extensively researched, yet it is useful in multimedia retrieval. We proposed an unsupervised algorithm, which expresses entities and classes as Wordnet synsets and uses Lin measure to classify them. Real-time hypernym discovery from Wikipedia is used to map uncommon entities to Wordnet...
متن کاملLinked hypernyms: Enriching DBpedia with Targeted Hypernym Discovery
The Linked Hypernyms Dataset (LHD) provides entities described by Dutch, English and German Wikipedia articles with types in the DBpedia namespace. The types are extracted from the first sentences of Wikipedia articles using Hearst pattern matching over part-of-speech annotated text and disambiguated to DBpedia concepts. The dataset covers 1.3 million RDF type triples from English Wikipedia, ou...
متن کاملA Java Framework for Multilingual Definition and Hypernym Extraction
In this paper we present a demonstration of a multilingual generalization of Word-Class Lattices (WCLs), a supervised lattice-based model used to identify textual definitions and extract hypernyms from them. Lattices are learned from a dataset of automatically-annotated definitions from Wikipedia. We release a Java API for the programmatic use of multilingual WCLs in three languages (English, F...
متن کاملExtracting hypernym relations from Wikipedia disambiguation pages : comparing symbolic and machine learning approaches
Extracting hypernym relations from text is one of the key steps in the construction and enrichment of semantic resources. Several methods have been exploited in a variety of propositions in the literature. However, the strengths of each approach on a same corpus are still poorly identified in order to better take advantage of their complementarity. In this paper, we study how complementary two ...
متن کامل