Building Multiword Expressions Bilingual Lexicons for Domain Adaptation of an Example-Based Machine Translation System

نویسندگان

  • Nasredine Semmar
  • Meriama Laïb
چکیده

We describe in this paper a hybrid approach to build automatically bilingual lexicons of Multiword Expressions (MWEs) from parallel corpora. We more specifically investigate the impact of using a domain-specific bilingual lexicon of MWEs on domain adaptation of an Example-Based Machine Translation (EBMT) system. We conducted experiments on the English-French language pair and two kinds of texts: in-domain texts from Europarl (European Parliament proceedings) and out-of-domain texts from Emea (European Medicines Agency documents) and Ecb (European Central Bank corpus). The obtained results indicate that integrating domain-specific bilingual lexicons of MWEs improves translation quality of the EBMT system when texts to translate are related to the specific domain and induces a relatively slight deterioration of translation quality when translating general-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Statistical Machine Translation Using Domain Bilingual Multiword Expressions

Multiword expressions (MWEs) have been proved useful for many natural language processing tasks. However, how to use them to improve performance of statistical machine translation (SMT) is not well studied. This paper presents a simple yet effective strategy to extract domain bilingual multiword expressions. In addition, we implement three methods to integrate bilingual MWEs to Moses, the state...

متن کامل

Building Compact Lexicons for Cross-Domain SMT by Mining Near-Optimal Pattern Sets

Statistical machine translation models are known to benefit from the availability of a domain bilingual lexicon. Bilingual lexicons are traditionally comprised of multiword expressions, either extracted from parallel corpora or manually curated. We claim that “patterns”, comprised of words and higher order categories, generalize better in capturing the syntax and semantics of the domain. In thi...

متن کامل

LIHLA: A lexical aligner based on language-independent heuristics

Alignment of words and multiword units plays an important role in many natural language processing applications, such as example-based machine translation, transfer rule learning for machine translation, bilingual lexicography, word sense disambiguation, etc. In this paper we describe LIHLA, a lexical aligner which uses bilingual probabilistic lexicons generated by a freely available set of too...

متن کامل

Evaluating the LIHLA lexical aligner on Spanish, Brazilian Portuguese and Basque parallel texts

Alignment of words and multiword units plays an important role in many natural language processing applications, such as example-based machine translation, transfer rule learning for machine translation, bilingual lexicography, word sense disambiguation, etc. In this paper we describe LIHLA, a lexical aligner which uses bilingual probabilistic lexicons generated by a freely available set of too...

متن کامل

Translation of Multiword Expressions Using Parallel Suffix Arrays

Accurately translating multiword expressions is important to obtain good performance in machine translation, crosslanguage information retrieval, and other multilingual tasks in human language technology. Existing approaches to inducing translation equivalents of multiword units have focused on agglomerating individual words or on aligning words in a statistical machine translation system. We p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017