Utilizing Contextually Relevant Terms in Bilingual Lexicon Extraction
نویسندگان
چکیده
This paper demonstrates one efficient technique in extracting bilingual word pairs from non-parallel but comparable corpora. Instead of using the common approach of taking high frequency words to build up the initial bilingual lexicon, we show contextually relevant terms that co-occur with cognate pairs can be efficiently utilized to build a bilingual dictionary. The result shows that our models using this technique have significant improvement over baseline models especially when highestranked translation candidate per word is considered.
منابع مشابه
Bilingual lexicon extraction from comparable corpora using in-domain terms
Many existing methods for bilingual lexicon learning from comparable corpora are based on similarity of context vectors. These methods suffer from noisy vectors that greatly affect their accuracy. We introduce a method for filtering this noise allowing highly accurate learning of bilingual lexicons. Our method is based on the notion of in-domain terms which can be thought of as the most importa...
متن کاملMinimally supervised techniques for bilingual lexicon extraction
Normally, word translations are extracted from non-parallel, bilingual corpora, and initial bilingual lexicon, i.e., a list of known translations, is typically used to aid the learning process. This thesis highlights the study of a series of novel techniques that utilized scarce resources. To make the study even more challenging, only minimal use of resources was allowed and important major lin...
متن کاملA Comparative Study on Translation Units for Bilingual Lexicon Extraction
This paper presents on-going research on automatic extraction of bilingual lexicon from English-Japanese parallel corpora. The main objective of this paper is to examine various Ngram models of generating translation units for bilingual lexicon extraction. Three N-gram models, a baseline model (Bound-length N-gram) and two new models (Chunk-bound Ngram and Dependency-linked N-gram) are compared...
متن کاملIntegrating Cross-Lingually Relevant News Articles and Monolingual Web Documents in Bilingual Lexicon Acquisition
In the framework of bilingual lexicon acquisition from cross-lingually relevant news articles on the Web, it is relatively harder to reliably estimate bilingual term correspondences for low frequency terms. Considering such a situation, this paper proposes to complementarily use much larger monolingual Web documents collected by search engines, as a resource for reliably re-estimating bilingual...
متن کاملEffect of Cross-Language IR in Bilingual Lexicon Acquisition from Comparable Corpora
Within the framework of translation knowledge acquisition from WWW news sites, this paper studies issues on the effect of cross-language retrieval of relevant texts in bilingual lexicon acquisition from comparable corpora. We experimentally show that it is quite effective to reduce the candidate bilingual term pairs against which bilingual term correspondences are estimated, in terms of both co...
متن کامل