Compact Encodings for All Local Path Information in Web Taxonomies with Application to WordNet
نویسندگان
چکیده
We consider the problem of finding a compact labelling for large, rooted web taxonomies that can be used to encode all local path information for each taxonomy element. This research is motivated by the problem of developing standards for taxonomic data, and addresses the data intensive problem of evaluating semantic similarities between taxonomic elements. Evaluating such similarities often requires the processing of large common ancestor sets between elements. We propose a new class of compact labelling schemes, designed for directed acyclic graphs, and tailored for applications to large web taxonomies. Our labelling schemes significantly reduce the complexity of evaluating similarities among taxonomy elements by enabling the gleaning of inferences from the labels alone, without searching the data structure. We provide an analysis of the label lengths for the proposed schemes based on structural properties of the taxonomy. Finally, we provide supporting empirical evidence for the quality of these schemes by evaluating the performance on the WordNet taxonomy.
منابع مشابه
TRIPPER: Rule Learning Using Taxonomies
In many application domains, there is a need for learning algorithms that generate accurate as well as comprehensible classifiers. In this paper, we present TRIPPER a rule induction algorithm that extends RIPPER, a widely used rule-learning algorithm. TRIPPER exploits knowledge in the form of taxonomies over the values of features used to describe data. We compare the performance of TRIPPER wit...
متن کاملComparing the UCREL Semantic Annotation Scheme with Lexicographical Taxonomies
Annotation schemes for semantic field analysis use abstract concepts to classify words and phrases in a given text. The use of such schemes within lexicography is increasing. Indeed, our own UCREL semantic annotation system (USAS) is to form part of a web-based ‘intelligent’ dictionary (Herpiö 2002). As USAS was originally designed to enable automatic content analysis (Wilson and Rayson 1993), ...
متن کاملSemantic disambiguation of taxonomies
Polysemy is one of the most difficult problems when dealing with natural language resources. Consequently, automated ontology learning from textual sources (such as web resources) is hampered by the inherent ambiguity of human language. In order to tackle this problem, this paper presents an automatic and unsupervised method for disambiguating taxonomies (the key component of a final ontology)....
متن کاملA Semi-Supervised Method to Learn and Construct Taxonomies Using the Web
Although many algorithms have been developed to harvest lexical resources, few organize the mined terms into taxonomies. We propose (1) a semi-supervised algorithm that uses a root concept, a basic level concept, and recursive surface patterns to learn automatically from the Web hyponym-hypernym pairs subordinated to the root; (2) a Web based concept positioning procedure to validate the learne...
متن کاملAutomatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کامل