The Parsing Algorithm of Translation Corresponding Tree (TCT) Grammar
نویسنده
چکیده
In machine translation (MT), parsing acts as a kernel step to analyze and acquire the syntactic information of an input sentence for the purpose to reproduce the corresponding translation in target language according to the syntactic relationships between the source and target sentences. The parsing process is guided by a set of language formalism, and the design of such algorithm is highly depending on how the language information it represents, especially in the development of example-based machine translation (EBMT) system, where the foundmental language information are the set of translation examples. In this paper, a parsing algorithm is designed to parse the Translation Corresponding Tree (TCT) structure based on the augmented GLR algorithm. The TCT, as the examples representation schema, has been used to the annotation of bilingual text in EBMT system. In order to achieve in parsing the tree language based on GLR parsing algorithm, a TCT equivalent synchronous formalism based on the notation of context free grammar (CFG) is proposed. Where the feature properties of a TCT structure can be fully expressed in term of the proposed formalism, and which, as a result, can be parsed by any CFG parsers. In this paper, GLR algorithm is extended and adapted to this parsing task in the development of an EBMT system for Portuguese to Chinese machine translation.
منابع مشابه
Application of Translation Corresponding Tree (TCT) Annotation Schema for Chinese to Portuguese Machine Translation
In Example Based Machine Translation (EBMT) research, there are three main approaches: Surface Based, Pattern Based and Structure Based approach. In Structure Based EBMT system, such as SSTC approach [1], it has a problem that it relies on two syntax parsers to analyze the translation examples, but robust syntax parsers are not always available. On the other hand, Chinese and Portuguese belong ...
متن کاملLexicalized Tree Automata-Based Grammars For Translating Conversational Texts
We propose a new lexicalizcd grammar formalism called Lexicalized Tree Automata-based Grammar, which lcxicalizes tree acccptors instead of trees themselves. We discuss the properties of the grammar and present a chart parsing algorithm. Wc have implemented a translation module for conversational texts using this formalism, and applied it to an experimental automatic interpretation system (speec...
متن کاملA Generalized View on Parsing and Translation
We present a formal framework that generalizes a variety of monolingual and synchronous grammar formalisms for parsing and translation. Our framework is based on regular tree grammars that describe derivation trees, which are interpreted in arbitrary algebras. We obtain generic parsing algorithms by exploiting closure properties of regular tree languages.
متن کاملTree parsing for tree-adjoining machine translation
Tree parsing is an important problem in statistical machine translation. In this context, one is given (a) a synchronous grammar that describes the translation from one language into another and (b) a recognizable set of trees; the aim is to construct a finite representation of the set of those derivations that derive elements from the given set, either on the source side (input restriction) or...
متن کاملMyanmar-English Bidirectional Machine Translation System with Numerical Particles Identification
This paper the development of MyanmarEnglish bidirectional machine translation system is implemented applying Rule based machine translation approach. Stanford and ML2KR parsers are used for preprocessing step. From this step, parsers generate corresponding parse tree structures. Used parsers generate corresponding CFG rules which are collected and created as synchronous context free grammar SC...
متن کامل