Cross-Lingual Word Sense Disambiguation for Languages with Scarce Resources
نویسندگان
چکیده
Word Sense Disambiguation has long been a central problem in computational linguistics. Word Sense Disambiguation is the ability to identify the meaning of words in context in a computational manner. Statistical and supervised approaches require a large amount of labeled resources as training datasets. In contradistinction to English, the Persian language has neither any semantically tagged corpus to aid machine learning approaches for Persian texts, nor any suitable parallel corpora. Yet due to the ever-increasing development of Persian pages in Wikipedia, this resource can act as a comparable corpus for English-
منابع مشابه
Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian
We explore the use of unsupervised methods in Cross-Lingual Word Sense Disambiguation (CL-WSD) with the application of English to Persian. Our proposed approach targets the languages with scarce resources (low-density) by exploiting word embedding and semantic similarity of the words in context. We evaluate the approach on a recent evaluation benchmark and compare it with the state-of-the-art u...
متن کاملTowards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian
Sense tagged corpora play a crucial role in Natural Language Processing, particularly in Word Sense Disambiguation and Natural Language Understanding. Since semantic annotations are usually performed by humans, such corpora are limited to a handful of tagged texts and are not available for many languages with scarce resources including Persian. The shortage of efficient, reliable linguistic res...
متن کاملLet Sense Bags Do Talking: Cross Lingual Word Semantic Similarity for English and Hindi
Cross Lingual Word Semantic (CLWS) similarity is defined as a task to find the semantic similarity between two words across languages. Semantic similarity has been very popular in computing the similarity between two words in same language. CLWS similarity will prove to be very effective in the area of Cross Lingual Information Retrieval, Machine Translation, Cross Lingual Word Sense Disambigua...
متن کاملLIMSI : Cross-lingual Word Sense Disambiguation using Translation Sense Clustering
We describe the LIMSI system for the SemEval-2013 Cross-lingual Word Sense Disambiguation (CLWSD) task. Word senses are represented by means of translation clusters in different languages built by a cross-lingual Word Sense Induction (WSI) method. Our CLWSD classifier exploits the WSI output for selecting appropriate translations for target words in context. We present the design of the system ...
متن کاملCross-Lingual Word Sense Disambiguation
Word Sense Disambiguation using Cross-Lingual approach has been used successfully for languages like Farsi and Hindi. However, a comparable corpus in the form of Wikipedia articles available in English and Hindi has been used for such a task. This motivated us to further the approach and test the results when a parallel corpus is used. In this project, we specifically wanted to observe if the a...
متن کامل