Analyzing and Accessing Wikipedia as a Lexical Semantic Resource
نویسندگان
چکیده
We analyze Wikipedia as a lexical semantic resource and compare it with conventional resources, such as dictionaries, thesauri, semantic wordnets, etc. Di erent parts of Wikipedia re ect di erent aspects of these resources. We show that Wikipedia contains a vast amount of knowledge about, e.g., named entities, domain speci c terms, and rare word senses. If Wikipedia is to be used as a lexical semantic resource in large-scale NLP tasks, e cient programmatic access to the knowledge therein is required. We review existing access mechanisms and show that they are limited with respect to performance and the provided access functions. Therefore, we introduce a general purpose, high performance Java-based Wikipedia API that overcomes these limitations. It is available for research purposes at http://www.ukp.tu-darmstadt. de/software/WikipediaAPI.
منابع مشابه
Evaluating a Semantic Network Automatically Constructed from Lexical Co-occurrence on a Word Sense Disambiguation Task
We describe the extension and objective evaluation of a network of semantically related noun senses (or concepts) that has been automatically acquired by analyzing lexical cooccurrence in Wikipedia. The acquisition process makes no use of the metadata or links that have been manually built into the encyclopedia, and nouns in the network are automatically disambiguated to their corresponding nou...
متن کاملUsing Wiktionary for Computing Semantic Relatedness
We introduce Wiktionary as an emerging lexical semantic resource that can be used as a substitute for expert-made resources in AI applications. We evaluate Wiktionary on the pervasive task of computing semantic relatedness for English and German by means of correlation with human rankings and solving word choice problems. For the first time, we apply a concept vector based measure to a set of d...
متن کاملDBnary: Wiktionary as a Lemon-based multilingual lexical resource in RDF
Contributive resources, such as Wikipedia, have proved to be valuable to Natural Language Processing or multilingual Information Retrieval applications. This work focusses on Wiktionary, the dictionary part of the resources sponsored by the Wikimedia foundation. In this article, we present our extraction of multilingual lexical data from Wiktionary data and to provide it to the community as a M...
متن کاملBabelNet: Building a Very Large Multilingual Semantic Network
In this paper we present BabelNet – a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the resource with lexical information for all languages. We conduct experiments on new and ...
متن کاملBabelplagiarism: What can BabelNet do for Cross-language Plagiarism Detection?
In the first part of the talk, I will present BabelNet, a very large, wide-coverage multilingual semantic network. The resource is automatically constructed by means of a methodology that integrates lexicographic and encyclopedic knowledge from WordNet and Wikipedia. In addition Machine Translation is also applied to enrich the knowledge resource with lexical information for all languages. We p...
متن کامل