Evaluating n-grams Models for the Bilingual Word Sense Disambiguation Task

نویسندگان

  • David Pinto
  • Darnes Vilariño Ayala
  • Carlos Balderas Posada
  • Mireya Tovar
  • Beatríz Beltrán
چکیده

The problem of Word Sense Disambiguation (WSD) is about selecting the correct sense of an ambiguous word in a given context. However, even if the problem of WSD is difficult, when we consider its bilingual version, this problem becomes much more complex. In this case, it is necessary not only to find the correct translation, but such translation must consider the contextual senses of the original sentence (in the source language), in order to find the correct sense (in the target language) of the source word. In this paper we present a probabilistic model for bilingual WSD based on n-grams (2-grams, 3-grams, 5-grams and kgrams, for a sentence S of a length k). The aim is to analyze the behavior of the system with different representations of a given sentence containing an ambiguous word. We use a Naïve Bayes classifier for determining the probability of the target sense (in the target language) given a sentence which contains an ambiguous word (in the source language). For this purpose, we use a bilingual statistical dictionary, which is calculated with Giza++ by using the EUROPARL parallel corpus. On the average, the representation model based on 5-grams with mutual information demonstrated the best performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Probabilistic Model Based on n-Grams for Bilingual Word Sense Disambiguation

Word Sense Disambiguation (WSD) is considered one of the most important problems in Natural Language Processing. Even if the problem of WSD is difficult, when we consider its bilingual version, this problem becomes to be much more complex. In this case, it is needed not only to find the correct translation, but this translation must consider the contextual senses of the original sentence (in a ...

متن کامل

An Evaluation of Greek-English Cross Language Retrieval within the CLEF Ad-Hoc Bilingual Task

This article describes an experimental investigation on the use of resources from the web on a common Natural Language Problem (NLP) problem that of Word Sense Disambiguation (WSD). In particular we use our disambiguation experiments with statistical query translation on a Greek-English cross language retrieval system using Google’s n-grams. Results from our participation on the Ad-Hoc TEL trac...

متن کامل

رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی LDA

Word sense disambiguation is the task of identifying the correct sense for the word in a given context among a finite set of possible sense. In this paper a model for farsi word sense disambiguation is presented. The model use two group of features: first, all word and stop words around target word and topic models as second features. We extract topics from a farsi corpus with Latent Dirichlet ...

متن کامل

Syntax, Semantics and Structure in Statistical Translation

While automatic metrics of translation quality are invaluable for machine translation research, deeper understanding of translation errors require more focused evaluations designed to target specific aspects of translation quality. We show that Word Sense Disambiguation (WSD) can be used to evaluate the quality of machine translation lexical choice, by applying a standard phrase-based SMT syste...

متن کامل

A Semantic Evaluation of Machine Translation Lexical Choice

While automatic metrics of translation quality are invaluable for machine translation research, deeper understanding of translation errors require more focused evaluations designed to target specific aspects of translation quality. We show that Word Sense Disambiguation (WSD) can be used to evaluate the quality of machine translation lexical choice, by applying a standard phrase-based SMT syste...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computación y Sistemas

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2011