It Takes Two to Tango: A Bilingual Unsupervised Approach for Estimating Sense Distributions using Expectation Maximization
نویسندگان
چکیده
Several bilingual WSD algorithms which exploit translation correspondences between parallel corpora have been proposed. However, the availability of such parallel corpora itself is a tall task for some of the resource constrained languages of the world. We propose an unsupervised bilingual EM based algorithm which relies on the counts of translations to estimate sense distributions. No parallel or sense annotated corpora are needed. The algorithm relies on a synset-aligned bilingual dictionary and in-domain corpora from the two languages. A symmetric generalized Expectation Maximization formulation is used wherein the sense distributions of words in one language are estimated based on the raw counts of the words in the aligned synset in the target language. The overall performance of our algorithm when tested on 4 language-domain pairs is better than current state-of-the-art knowledge based and bilingual unsupervised approaches.
منابع مشابه
Neighbors Help: Bilingual Unsupervised WSD Using Context
Word Sense Disambiguation (WSD) is one of the toughest problems in NLP, and in WSD, verb disambiguation has proved to be extremely difficult, because of high degree of polysemy, too fine grained senses, absence of deep verb hierarchy and low inter annotator agreement in verb sense annotation. Unsupervised WSD has received widespread attention, but has performed poorly, specially on verbs. Recen...
متن کاملSegmentation of colour images using variational expectation-maximization algorithm
The approach proposed in this paper takes into account the uncertainty in colour modelling by employing variational Bayesian estimation. Mixtures of Gaussians are considered for modelling colour images. Distributions of parameters characterising colour regions are inferred from data statistics. The Variational Expectation-Maximization (VEM) algorithm is used for estimating the hyperparameters c...
متن کاملWord Sense Disambiguation Using IndoWordNet
Word Sense Disambiguation (WSD) is considered as one of the toughest problem in the field of Natural Language Processing. IndoWordNet is a linked structure of WordNets of major Indian languages. Recently, several IndoWordNet based WSD approaches have been proposed and implemented for Indian languages. In this chapter, we present the usage of various other features of IndoWordNet in performing W...
متن کاملTrainable Coarse Bilingual Grammars for Parallel Text Bracketing
We describe two new strategies to automatic bracketing of parallel corpora, with particular application to languages where prior grammar resources are scarce: (1) coarse bilingual grammars, and (2) unsupervised training of such grammars via EM (expectation-maximization). Both methods build upon a formalism we recently introduced called stochastic inversion transduction grammars. The first appro...
متن کاملUnsupervised learning of word sense disambiguation rules by estimating an optimum iteration number in the EM algorithm
In this paper, we improve an unsupervised learning method using the ExpectationMaximization (EM) algorithm proposed by Nigam et al. for text classification problems in order to apply it to word sense disambiguation (WSD) problems. The improved method stops the EM algorithm at the optimum iteration number. To estimate that number, we propose two methods. In experiments, we solved 50 noun WSD pro...
متن کامل