Ambiguous Arabic Words Disambiguation: The Results
نویسندگان
چکیده
In this paper we propose an hybrid system of Arabic words disambiguation. To achieve this goal we use the methods employed in the domain of information retrieval: Latent semantic analysis, Harman, Croft, Okapi, combined to the lesk algorithm. These methods are used to estimate the most relevant sense of the ambiguous word. This estimation is based on the calculation of the proximity between the current context (Context of the ambiguous word), and the different contexts of use of each meaning of the word. The Lesk algorithm is used to assign the correct sense of those proposed by the LSA, Harman, Croft and Okapi. The results found by the proposed system are satisfactory, we obtained a rate of disambiguation equal to 73%.
منابع مشابه
A Hybrid Approach for Arabic Word Sense Disambiguation
In this paper, we present a hybrid approach for Word Sense Disambiguation of Arabic Language (called WSD-AL), that combines unsupervised and knowledge-based methods. Some pre-processing steps are applied to texts containing the ambiguous words in the corpus (1500 texts extracted from the web), and the salient words that affect the meaning of these words are extracted. After that a Context Match...
متن کاملNaïve Bayes Classifier for Arabic Word Sense Disambiguation
Word Sense Disambiguation (WSD) is the process of selecting a sense of an ambiguous word in a given context from a set of predefined senses. Sense Inventory usually comes from a dictionary or thesaurus. In Arabic, the main cause of word ambiguity is the lack of diacritics of the most digital documents so the same word can occurs with different senses. In this paper, we use the rooting algorithm...
متن کاملConfusion Network for Arabic Name Disambiguation and Transliteration in Statistical Machine Translation
Arabic words are often ambiguous between name and non-name interpretations, frequently leading to incorrect name translations. We present a technique to disambiguate and transliterate names even if name interpretations do not exist or have relatively low probability distributions in the parallel training corpus. The key idea comprises named entity classing at the preprocessing step, decoding of...
متن کاملStatistical Corpus-Based Word Sense Disambiguation: Pseudowords vs. Real Ambiguous Words
In this paper we investigate whether the task of disambiguating pseudowords (artificial ambiguous words) is comparable to the disambiguation of real ambiguous words. Since the two methods are inherently different, a direct comparison is not possible. An indirect approach is taken where the setup for both systems is as similar as possible, i.e. using the same corpus and settings. The results obt...
متن کاملA Semi-Supervised Method for Arabic Word Sense Disambiguation Using a Weighted Directed Graph
In this paper, we propose a new semisupervised approach for Arabic word sense disambiguation. Using the corpus and Arabic Wordnet, we define a method to cluster the sentences containing ambiguous words. For each sense, we generate a cluster that we use to construct a semantic tree. Furthermore, we construct a weighted directed graph by matching the tree of the original sentence with semantic tr...
متن کامل