Automatically Creating Bilingual Lexicons for Machine Translation from Bilingual Text

ثبت نشده
چکیده

A method is presented for automatically augmenting the bilingual lexicon of an existing Machine Translation system, by extracting bilingual entries from aligned bilingual text. The proposed method only relies on the resources already available in the MT system itself. It is based on the use of bilingual lexical templates to match the terminal symbols in the parses of the aligned sentences. 1 I n t r o d u c t i o n A novel approach to automatically building bilingual lexicons is presented here. The term bilingual lexicon denotes a collection of complex equivalences as used in Machine Translation (MT) transfer lexicons, not just word equivalences. In addition to words, such lexicons involve syntactic and semantic descriptions and means to perform a correct transfer between the two sides of a bilingual lexical entry. A symbolic, rule-based approach of the parseparse-match kind is proposed. The core idea is to use the resources of bidirectional transfer MT systems for this purpose, taking advantage of their features to convert them to a novel use. In addition to having them use their bilingual lexicons to produce translations, it is proposed to have them use translations to produce bilingual lexicons. Although other uses might be conceived, the most appropriate use is to have an MT system automatically augment its own bilingual lexicon from a small initial sample. The core of the described approach consists of using a set of bilingual lexical templates in matching the parses of two aligned sentences and in turning the lexical equivalences thus established into new bilingual lexical entries. 2 T h e o r e t i c a l f r a m e w o r k The basic requirement that an MT system should meet for the present purpose is to be bidirectional. Bidirectionality is required in order to ensure that both source and target grammars can be used for parsing and that transfer can be done in both directions. More precisely, what is relevant is that the input and output to transfer be the same kind of structure. Moreover, the proposed method is most productive with a lexicalist MT system (Whitelock, 1994). The proposed application is concerned with producing bilingual lexical knowledge and this sort of knowledge is the only type of bilingual knowledge required by lexicalist systems. Nevertheless, it is also conceivable that the present approach can be used with a nonlexicalist transfer system, as long as the system is bidirectional. In this case, only the lexical portion of the bilingual knowledge can be automatically produced, assuming that the structural transfer portion is already in place. In the rest of this paper, a lexicalist MT system will be assumed and referred to. For the specific implementation described here and all the examples, we will refer to an existing lexicalist English-Spanish MT system (Popowich et al., 1997). The main feature of a lexicalist MT system is that it performs no structural transfer. Transfer is a mapping between a bag of lexical items used in parsing (the source bag) and a corresponding bag of target lexical items (the target bag), to be used in generation. The source bag actually contains more information than the corresponding bag of lexical items before parsing. Its elements get enriched with additional information instantiated during the parsing process. Information of fundamental importance included therein is a system of indices that express de-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatically Creating Bilingual Lexicons for Machine Translation from Bilingual Text

A method is presented for automatically augmenting the bilingual lexicon of an existing Machine Translation system, by extracting bilingual entries from aligned bilingual text. The proposed method only relies on the resources already available in the MT system itself. It is based on the use of bilingual lexical templates to match the terminal symbols in the parses of the aligned sentences.

متن کامل

Automatical ly Creating Bilingual Lexicons for Machine Translation from Bilingual Text

A method is presented for automatically augmenting the bilingual lexicon of an existing Machine Translation system, by extracting bilingual entries from Migned bilingual text. The proposed method only relies on the resources already available in the MT system itself. It is based on the use of bilingual lexical templates to match the terminal symbols in the parses of the aligned sentences. 1 I n...

متن کامل

Building Multiword Expressions Bilingual Lexicons for Domain Adaptation of an Example-Based Machine Translation System

We describe in this paper a hybrid approach to build automatically bilingual lexicons of Multiword Expressions (MWEs) from parallel corpora. We more specifically investigate the impact of using a domain-specific bilingual lexicon of MWEs on domain adaptation of an Example-Based Machine Translation (EBMT) system. We conducted experiments on the English-French language pair and two kinds of texts...

متن کامل

Creating Multilingual Translation Lexicons with Regional Variations Using Web Corpora

The purpose of this paper is to automatically create multilingual translation lexicons with regional variations. We propose a transitive translation approach to determine translation variations across languages that have insufficient corpora for translation via the mining of bilingual search-result pages and clues of geographic information obtained from Web search engines. The experimental resu...

متن کامل

Building a Bilingual Lexicon Using Phrase-based Statistical Machine Translation via a Pivot Language

This paper proposes a novel method for building a bilingual lexicon through a pivot language by using phrase-based statistical machine translation (SMT). Given two bilingual lexicons between language pairs Lf–Lp and Lp–Le, we assume these lexicons as parallel corpora. Then, we merge the extracted two phrase tables into one phrase table between Lf and Le. Finally, we construct a phrase-based SMT...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002