Discovering Missing Wikipedia Inter-language Links by means of Cross-lingual Word Sense Disambiguation
نویسندگان
چکیده
Wikipedia pages typically contain inter-language links to the corresponding pages in other languages. These links, however, are often incomplete. This paper describes a set of experiments in which the viability of discovering such missing inter-language links for ambiguous nouns by means of a cross-lingual Word Sense Disambiguation approach is investigated. The input for the inter-language link detection system is a set of Dutch pages for a given ambiguous noun and the output of the system is a set of links to the corresponding pages in three target languages (viz. French, Spanish and Italian). The experimental results show that although it is a very challenging task, the system succeeds to detect missing inter-language links between Wikipedia documents for a manually labeled test set. The final goal of the system is to provide a human editor with a list of possible missing links that should be manually verified.
منابع مشابه
Cross-Language Retrieval with Wikipedia
We demonstrate a twofold use of Wikipedia for cross-lingual information retrieval. As our main contribution, we exploit Wikipedia hyperlinkage for query term disambiguation. We also use bilingual Wikipedia articles for dictionary extension. Our method is based on translation disambiguation; we combine the Wikipedia based technique with a method based on bigram statistics of pairs formed by tran...
متن کاملUBA: Using Automatic Translation and Wikipedia for Cross-Lingual Lexical Substitution
This paper presents the participation of the University of Bari (UBA) at the SemEval2010 Cross-Lingual Lexical Substitution Task. The goal of the task is to substitute a word in a language Ls, which occurs in a particular context, by providing the best synonyms in a different language Lt which fit in that context. This task has a strict relation with the task of automatic machine translation, b...
متن کاملCross-Lingual Word Sense Disambiguation
Word Sense Disambiguation using Cross-Lingual approach has been used successfully for languages like Farsi and Hindi. However, a comparable corpus in the form of Wikipedia articles available in English and Hindi has been used for such a task. This motivated us to further the approach and test the results when a parallel corpus is used. In this project, we specifically wanted to observe if the a...
متن کاملMapping WordNet Domains, WordNet Topics and Wikipedia Categories to Generate Multilingual Domain Specific Resources
In this paper we present the mapping between WordNet domains and WordNet topics, and the emergent Wikipedia categories. This mapping leads to a coarse alignment between WordNet and Wikipedia, useful for producing domain-specific and multilingual corpora. Multilinguality is achieved through the cross-language links between Wikipedia categories. Research in word-sense disambiguation has shown tha...
متن کاملCross-Lingual Word Sense Disambiguation for Languages with Scarce Resources
Word Sense Disambiguation has long been a central problem in computational linguistics. Word Sense Disambiguation is the ability to identify the meaning of words in context in a computational manner. Statistical and supervised approaches require a large amount of labeled resources as training datasets. In contradistinction to English, the Persian language has neither any semantically tagged cor...
متن کامل