I N V I T E D TALK Head A u t o m a t a and Bilingual Tiling: Translation with Minimal Representat ions

ثبت نشده
چکیده

We present a language model consisting of a collection of costed bidirectional finite state automata associated with the head words of phrases. The model is suitable for incremental application of lexical associations in a dynamic programming search for optimal dependency tree derivations. We also present a model and algorithm for machine translation involving optimal "tiling" of a dependency tree with entries of a costed bilingual lexicon. Experimental results are reported comparing methods for assigning cost functions to these models. We conclude with a discussion of the adequacy of annotated linguistic strings as representations for machine translation. 1 I n t r o d u c t i o n Until the advent of statistical methods in the mainstream of natural language processing, syntactic and semantic representations were becoming progressively more complex. This trend is now reversing itself, in part because statistical methods reduce the burden of detailed modeling required by constraint-based grammars, and in part because statistical models for converting natural language into complex syntactic or semantic representations is not well understood at present. At the same time, lexically centered views of language have continued to increase in popularity. We can see this in lexicalized grammatical theories, head-driven parsing and generation, and statistical disambiguation based on lexical associations. These themes simple representations, statistical modeling, and lexicalism form the basis for the models and algorithms described in the bulk of this paper. The primary purpose is to build effective mechanisms for machine translation, the oldest and still the most commonplace application of nonsuperficial natural language processing. A secondary motivation is to test the extent to which a non-trivial language processing task can be carried out without complex semantic representations. In Section 2 we present reversible mono-lingual models consisting of collections of simple automata associated with the heads of phrases. These head automata are applied by an algorithm with admissible incremental pruning based on semantic association costs, providing a practical solution to the problem of combinatoric disambiguation (Church and Patil 1982). The model is intended to combine the lexical sensitivity of N-gram models (Jelinek et al. 1992) and the structural properties of statistical context free grammars (Booth 1969) without the computational overhead of statistical lexicalized treeadjoining grammars (Schabes 1992, Resnik 1992). For translation, we use a model for mapping dependency graphs written by the source language head automata. This model is coded entirely as a bilingual lexicon, with associated cost parameters. The transfer algorithm described in Section 4 searches for the lowest cost 'tiling' of the target dependency graph with entries from the bilingual lexicon. Dynamic programming is again used to make exhaustive search tractable, avoiding the combinatoric explosion of shake-and-bake translation (Whitelock 1992, Brew 1992). In Section 5 we present a general framework for associating costs with the solutions of search processes, pointing out some benefits of cost functions other than log likelihood, including an error-minimization cost function for unsupervised training of the parameters in our translation application. Section 6 briefly describes an English-Chinese translator employing the models and algorithms. We also present experimental results comparing the performance of different cost assignment methods. Finally, we return to the more general discussion of representations for machine translation and other natural language processing tasks, arguing the case for simple representations close to natural language itself. 2 H e a d A u t o m a t a L a n g u a g e M o d e l s 2.1 Lexieal and Dependency Parameters Head automata mono-lingual language models consist of a lexicon, in which each entry is a pair (w, m) of a word w from a vocabulary V and a head automaton m (defined below), and a parameter table giving an assignment of costs to events in a generative process involving the automata.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Amitraz Poisoning; A case study

A m i t r a z, a n i ns e c t i c i d e /a ca ri c i de of the f o r m a m i d i n e p e st i c i d e s group, is a ? 2 a d r e n e r g i c ag on i st a nd of t he a m i d i ne c h e m i ca l f a m il y generally us e d to c o n t r ol animal e c top a r a s i t e s. Poisoning due to am i t r a z i s r a r e and character...

متن کامل

Shifting Meaning Representations

This paper argues that a number of d i f f e r e n t kinds of meaning represen ta t ion , between which p a r t i a l t r ans la t i ons can be made as needed, are a l l required for a reasonably comprehensive language processing system. These representat ions capture d i f f e r e n t and possib ly complementary aspects of a t e x t ' s form, content and reference wor lds, and are sui ted to d...

متن کامل

Using Situation Descriptions and Russellian Attitudes for Representing Beliefs and Wants

A representat ion scheme fo r a rb i t r a r y be l ie fs and wants of an agent in respect to a s i t u a t i o n , as well as to a r b i t r a r y be l ie fs and wants of other agents, is presented. The representat ion makes use of elementary s i t ua t i on descr ipt ions (which are formulated in KL-ONE and de l imi ta ted by p a r t i t i o n s ) , and acceptance a t t i tudes in respect to ...

متن کامل

NITROIMIDAZOLES VII [I]. SYNTHESES, ANTIBACTERIAL AND ANTIFUNGAL ACT I V I T I E S OF 4 - S U B S T I T U T E D - 2 - TWIAZOLYLHYDRAZONES

Reaction of readily available 2-formyl -1-methyl-5-nitroimidazole (1) with 4-substituted-2-hydrazinothiazoles (2) afforded 2-formyl-l-methyl-5- nitroimidazole 4-substituted -2- thiazolylhydrazones (3). Reaction of thiochromitn-4-one (4) with thiosemicarbazide gave thiochroman-4- one thiosemicarbazone (5). Reaction of compound 5 with clc- haloaldehyde or ketones y ielded thiochroman--#-one ...

متن کامل

A Comparison of Head Transducers and Transfer for a Limited Domain Translation Application

We compare the effectiveness of two related • machine translation models applied to the same limited-domain task. One is a transfer model with monolingual head automata for analysis and generation; the other is a direct transduction model based on bilingual head transducers. We conclude that the head transducer model is more effective according to measures of accuracy, computational requirement...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002