Domain-Specific Knowledge Acquisition Using WordNet
نویسنده
چکیده
This paper presents a method that acquires new concepts and connections associated with user-selected seed concepts, and adds them to the WordNet linguistic knowledge structure. New domain knowledge can be acquired around some seed concepts that a user considers important. The knowledge we seek to acquire relates to one or more of these concepts, and consists of new concepts not defined in WordNet and new relations that link the concepts with other concepts. The approach consists of forming a corpus with sentences containing seed concepts and then identifying on this corpus lexico-syntactic patterns that reflect semantic relations. The algorithm has four procedures outlined below. Procedure 1: Concept extraction. Input: Noun phrases that contain a seed concept. Output: New concepts constructed around the seed concept. After the sentences in the corpus are parsed, new concepts are sought in the noun phrases where seed concepts reside. This procedure searches in WordNet and other electronic dictionaries for possible concepts in the noun phrases. The final acceptance of the concepts rests with the user. The next step is to create a taxonomy for the newly acquired concepts that is consistent with WordNet. Procedure 2: Classification by subsumption Input: A list of NPs containing the seed as head noun Output: An ontology of concepts under the seed The classification algorithm is based on the simple idea that a compound concept [word, seed] is ontologically subsumed by concept [seed]. Similarly, for a relative classification of any two concepts [word1, seed] and [word2, seed], the ontological relation between word1 and word2, if it exists, is extended to the two concepts. In the case that word1 subsumes word2, then a relation is formed between the two concepts. Texts are a rich source of information from which in addition to concepts we can also learn relations between concepts. We are interested here on finding out semantic relations that may link the concepts extracted above with other concepts. The approach is to search for lexico-syntactic patterns comprising the concepts of interest. These new re-
منابع مشابه
Domain-Specific Knowledge Acquisition and Classification Using WordNet
For many knowledge intensive applications, it is necessary to have extensive domain-specific knowledge in addition to general-purpose knowledge bases usually built around Machine Readable Dictionaries. This paper presents a methodology for acquiring domain specific knowledge from text and classifying the concepts learned into an ontology that extends WordNet. The method was tested for three see...
متن کاملQuery Architecture Expansion in Web Using Fuzzy Multi Domain Ontology
Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...
متن کاملA Graph Model for Unsupervised Lexical Acquisition
This paper presents an unsupervised method for assembling semantic knowledge from a part-ofspeech tagged corpus using graph algorithms. The graph model is built by linking pairs of words which participate in particular syntactic relationships. We focus on the symmetric relationship between pairs of nouns which occur together in lists. An incremental cluster-building algorithm using this part of...
متن کاملKnowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems
One of the main obstacles to highperformance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguati...
متن کاملMinimal training based semantic categorization in a voice activated question answering (VAQA) system
In this paper, we develop a knowledge based methodology that maps Automatic Speech Recognizer (ASR) transcriptions to predefined semantic categories in a Voice Activated Question Answering (VAQA) system. The proposed semantic categorization methodology, SemCat, uses a novel lexical chains/ontology based algorithm and relies heavily on customized but domain independent Natural Language Processin...
متن کاملAutomatic Discovery of Linguistic Patterns for Information Extraction
Information Extraction (IE) systems typically rely on extraction patterns encoding domain-specific knowledge. When matched against natural language texts, these patterns recognize with high accuracy information relevant to the extraction task. Adapting an IE system to a new extraction scenario entails devising a new collection of extraction patterns a time-consuming and expensive process. To ov...
متن کامل