Syntax in machine translation
نویسنده
چکیده
The general problem area of machine translation has been outlined and related to other fields of language-data processing in Garvin's “A Linguist's View of Language-data Processing.” It has been discussed in some detail in the two immediately preceding sections by Hays and Harper. This discussion will deal more specifically with the question of syntax: both with the place of syntax in the over-all design of a machinetranslation program, and with the detailed problems of syntactic resolution. Machine translation is a process for translating text automatically from one language to another. This process is embodied in a program—that is, a set of instructions for a computer. With such a program, any large-scale computing machine can be transformed into a translating machine. A genuine translation is more than just the replacement of individual foreign words by individual English words. It must transmit the essential information contained in the foreign text to an English reader who is unfamiliar with the language. The meaning of each sentence is more than the sum of the meanings of each word—it also depends on the structure of the sentence and the function of each word in that structure. To obtain a satisfactory translation, the program must therefore contain not only a dictionary lookup for finding individual word meanings, but also a translation algorithm for making correct translation choices and for detecting the elements of meanings implicit in the sentence. The core of such a translation algorithm is a syntax routine. The major purpose of a syntax routine in machine translation is to recognize and appropriately record the boundaries and functions of the various components of the sentence. This syntactic information is essential for the efficient solution of the problem of word order for the output and is equally indispensable for the proper recognition of the determiners for multiple-meaning choices. It is furthermore becoming increasingly apparent that it is the design of the syntax routine which governs the over-all layout of a good machine-
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملمدل ترجمه عبارت-مرزی با استفاده از برچسبهای کمعمق نحوی
Phrase-boundary model for statistical machine translation labels the rules with classes of boundary words on the target side phrases of training corpus. In this paper, we extend the phrase-boundary model using shallow syntactic labels including POS tags and chunk labels. With the priority of chunk labels, the proposed model names non-terminals with shallow syntactic labels on the boundaries of ...
متن کاملRelabeling Syntax Trees to Improve Syntax-Based Machine Translation Quality
We identify problems with the Penn Treebank that render it imperfect for syntaxbased machine translation and propose methods of relabeling the syntax trees to improve translation quality. We develop a system incorporating a handful of relabeling strategies that yields a statistically significant improvement of 2.3 BLEU points over a baseline syntax-based system.
متن کاملThe Syntax Augmented MT (SAMT) System at the Shared Task for the 2007 ACL Workshop on Statistical Machine Translation
We describe the CMU-UKA Syntax Augmented Machine Translation system ‘SAMT’ used for the shared task “Machine Translation for European Languages” at the ACL 2007 Workshop on Statistical Machine Translation. Following an overview of syntax augmented machine translation, we describe parameters for components in our open-source SAMT toolkit that were used to generate translation results for the Spa...
متن کاملR ’ s Machine Translation System for IWSLT 2009
In this paper, we describe the system and approach used by the Institute for Infocomm Research (IR) for the IWSLT 2009 spoken language translation evaluation campaign. Two kinds of machine translation systems are applied, namely, phrase-based machine translation system and syntax-based machine translation system. To test syntax-based machine translation system on spoken language translation, va...
متن کاملThe Syntax Augmented MT (SAMT) System for the Shared Task in the 2007 ACL Workshop on Statistical Machine Translation
We describe the CMU-UKA Syntax Augmented Machine Translation system ‘SAMT’ used for the shared task “Machine Translation for European Languages” at the ACL 2007 Workshop on Statistical Machine Translation. Following an overview of syntax augmented machine translation, we describe parameters for components in our open-source SAMT toolkit that were used to generate translation results for the Spa...
متن کامل