Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study
نویسندگان
چکیده
A central problem of word sense disambiguation (WSD) is the lack of manually sense-tagged data required for supervised learning. In this paper, we evaluate an approach to automatically acquire sensetagged training data from English-Chinese parallel corpora, which are then used for disambiguating the nouns in the SENSEVAL-2 English lexical sample task. Our investigation reveals that this method of acquiring sense-tagged data is promising. On a subset of the most difficult SENSEVAL-2 nouns, the accuracy difference between the two approaches is only 14.0%, and the difference could narrow further to 6.5% if we disregard the advantage that manually sense-tagged data have in their sense coverage. Our analysis also highlights the importance of the issue of domain dependence in evaluating WSD programs.
منابع مشابه
Experiments In Word Domain Disambiguation For Parallel Texts
This paper describes some preliminary results aboutWord DomainDisambiguation, a variant of Word Sense Disambiguation where words in a text are tagged with a domain label in place of a sense label. The EnglishWordNet and its aligned Italian version,MultiWordNet, both augmented with domain labels, are used as the main information repositories. A baseline algorithm for Word Domain Disambiguation i...
متن کاملExploiting Parallel Texts to Produce a Multilingual Sense Tagged Corpus for Word Sense Disambiguation
We describe an approach to the automatic creation of a sense tagged corpus intended to train a word sense disambiguation (WSD) system for English-Portuguese machine translation. The approach uses parallel corpora, translation dictionaries and a set of straightforward heuristics. In an evaluation with nine corpora containing 10 ambiguous verbs, the approach achieved an average precision of 94%, ...
متن کاملWord Translation Disambiguation without Parallel Texts∗
Word Translation Disambiguation means to select the best translation(s) given a source word in context and a set of target candidates. Two approaches to determining similarity between input and sample context are presented, using n-gram and vector space models with huge annotated monolingual corpora as main knowledge source, rather than relying on large parallel corpora. Experiments on SemEval’...
متن کاملUsing Parallel Texts and Lexicons for Verbal Word Sense Disambiguation
We present a system for verbal Word Sense Disambiguation (WSD) that is able to exploit additional information from parallel texts and lexicons. It is an extension of our previous WSD method (Dušek et al., 2014), which gave promising results but used only monolingual features. In the follow-up work described here, we have explored two additional ideas: using English-Czech bilingual resources (as...
متن کاملWord Sense Disambiguation for All Words without Hard Labor
While the most accurate word sense disambiguation systems are built using supervised learning from sense-tagged data, scaling them up to all words of a language has proved elusive, since preparing a sense-tagged corpus for all words of a language is time-consuming and human labor intensive. In this paper, we propose and implement a completely automatic approach to scale up word sense disambigua...
متن کامل