statistical machine translation

The Effect of Translationese on Tuning for Statistical Machine Translation

2017

Sara Stymne

We explore how the translation direction in the tuning set used for statistical machine translation affects the translation results. We explore this issue for three language pairs. While the results on different metrics are somewhat conflicting, using tuning data translated in the same direction as the translation systems tends to give the best length ratio and Meteor scores for all language pa...

متن کامل

Fast decoding for statistical machine translation

1998

Ye-Yi Wang Alexander H. Waibel

We investigated an e cient decoding algorithm for statistical machine translation. Compared to the other algorithms, this new algorithm is applicable to di erent translation models, and it is much faster. Experiments showed that the algorithm achieved an overall performance comparable to the state of the art decoding algorithms.

متن کامل

Influence of Part-of-Speech and Phrasal Category Universal Tag-set in Tree-to-Tree Translation Models

2013

Francisco Oliveira Derek F. Wong Lidia S. Chao Liang Tian Liangye He

Tree-to-tree Statistical Machine Translation models require the use of syntactic tree structures of both the source and target side in learning rules to guide the translation process. In order to accomplish the task, available treebanks for different languages are used as the main resources to collect necessary information to handle the translation task. However, since each treebank has its own...

متن کامل

Automatic Compound Processing: Compound Splitting and Semantic Analysis for Afrikaans and Dutch Wordsyoudontknow: Evaluation of Lexicon-based Decompounding with Unknown Handling Distinguishing Degrees of Compositionality in Compound Splitting for Statistical Machine Translation Modelling Regular Subcategorization Changes in German Particle Verbs

2014

Ben Verhoeven Walter Daelemans Menno van Zaanen Gerhard van Huyssteen Elizaveta Clouet

German particle verbs are a type of multi word expression which is often compositional with respect to a base verb. If they are compositional they tend to express the same types of semantic arguments, but they do not necessarily express them in the same syntactic subcategorization frame: some arguments may be expressed by differing syntactic subcategorization slots and other arguments may be on...

متن کامل

Phrasetable Smoothing for Statistical Machine Translation

2006

George F. Foster Roland Kuhn Howard Johnson

We discuss different strategies for smoothing the phrasetable in Statistical MT, and give results over a range of translation settings. We show that any type of smoothing is a better idea than the relativefrequency estimates that are often used. The best smoothing techniques yield consistent gains of approximately 1% (absolute) according to the BLEU metric.

متن کامل

I2r's machine translation system for IWSLT 2010

2010

Xiangyu Duan Rafael E. Banchs Jun Lang Deyi Xiong AiTi Aw Min Zhang Haizhou Li

In this paper, we describe the system and approach used by Institute for Infocomm Research (IR) for IWSLT 2010 spoken language translation evaluation campaign. We apply system combination on top of two kinds of statistical machine translation system, namely, phrase-based system and syntaxbased system. Experimental results show consistent improvements on DIALOG Task.

متن کامل

Effects of Integrating Multiple Bilingually-Trained Segmentation Schemes for Japanese-English SMT

2010

Michael Paul Eiichiro Sumita

This paper proposes a method to integrate multiple segmentation schemes into a single statistical machine translation (SMT) system by characterizing the source language side and merging identical translation pairs of differently segmented SMT models. Experimental results translating Japanese into English revealed that the proposed method of integrating multiple segmentation schemes outperforms ...

متن کامل

Examining the Relationship between Preordering and Word Order Freedom in Machine Translation

2016

Joachim Daiber Milos Stanojevic Wilker Aziz Khalil Sima'an

We study the relationship between word order freedom and preordering in statistical machine translation. To assess word order freedom, we first introduce a novel entropy measure which quantifies how difficult it is to predict word order given a source sentence and its syntactic analysis. We then address preordering for two target languages at the far ends of the word order freedom spectrum, Ger...

متن کامل

A Simple and Effective Hierarchical Phrase Reordering Model

2008

Michel Galley Christopher D. Manning

While phrase-based statistical machine translation systems currently deliver state-of-theart performance, they remain weak on word order changes. Current phrase reordering models can properly handle swaps between adjacent phrases, but they typically lack the ability to perform the kind of long-distance reorderings possible with syntax-based systems. In this paper, we present a novel hierarchica...

متن کامل

Partitioning Parallel Documents Using Binary Segmentation

2006

Jia Xu Richard Zens Hermann Ney

In statistical machine translation, large numbers of parallel sentences are required to train the model parameters. However, plenty of the bilingual language resources available on web are aligned only at the document level. To exploit this data, we have to extract the bilingual sentences from these documents. The common method is to break the documents into segments using predefined anchor wor...

متن کامل