Towards Cross-Language Word Sense Disambiguation for Quechua
نویسنده
چکیده
In this paper we present initial work on cross-language word sense disambiguation for translating adjectives from Spanish to Quechua and situate CLWSD as part of the translation task. While there are many available resources for training Spanish-language NLP systems, linguistic resources for Quechua, especially Spanish-Quechua bitext, are quite limited, so some ingenuity is required in developing Spanish-Quechua systems. This work makes use of only freely available resources and compares a few different techniques for CLWSD, including classifiers with simple word context features, features from a Spanish-language dependency parser, a multilingual version of the Lesk algorithm, and a distance metric based on the Spanish wordnet.
منابع مشابه
Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian
Sense tagged corpora play a crucial role in Natural Language Processing, particularly in Word Sense Disambiguation and Natural Language Understanding. Since semantic annotations are usually performed by humans, such corpora are limited to a handful of tagged texts and are not available for many languages with scarce resources including Persian. The shortage of efficient, reliable linguistic res...
متن کاملMorphological Disambiguation and Text Normalization for Southern Quechua Varieties
We built a pipeline to normalize Quechua texts through morphological analysis and disambiguation. Word forms are analyzed by a set of cascaded finite state transducers which split the words and rewrite the morphemes to a normalized form. However, some of these morphemes, or rather morpheme combinations, are ambiguous, which may affect the normalization. For this reason, we disambiguate the morp...
متن کاملWord Sense Disambiguation for Cross-Language Information Retrieval
We have developed a word sense disambiguation algorithm, following Cheng and Wilensky (1997), to disambiguate among WordNet synsets. This algorithm is to be used in a cross-language information retrieval system, CINDOR, which indexes queries and documents in a language-neutral concept representation based on WordNet synsets. Our goal is to improve retrieval precision through word sense disambig...
متن کاملTamil to English Cross Lingual Information Retrieval System for Agricultural Domain Using VSM
Language processing is prompt research area across the country. In that, query translation is one of the major areas of research for the past ten decades. Tamil is morphologically rich and complex language. The suitable morphological processing is very important for Cross Lingual Information Retrieval (CLIR). The contributions towards Tamil to English query translation and transliteration are l...
متن کاملLWA 2006 Proceedings
In this paper we present an interface for supporting a user in an interactive cross-language search process using semantic classes. In order to enable users to access multilingual information, different problems have to be solved: disambiguating and translating the query words, as well as categorizing and presenting the results appropriately. Therefore, we first give a brief introduction to wor...
متن کامل