Phrase-based Machine Transliteration

نویسندگان

  • Andrew M. Finch
  • Eiichiro Sumita
چکیده

This paper presents a technique for transliteration based directly on techniques developed for phrase-based statistical machine translation. The focus of our work is in providing a transliteration system that could be used to translate unknown words in a speech-to-speech machine translation system. Therefore the system must be able to generate arbitrary sequence of characters in the target language, rather than words chosen from a predetermined vocabulary. We evalauted our method automatically relative to a set of human-annotated reference transliterations as well as by assessing it for correctness using human evaluators. Our experimental results demonstrate that for both transliteration and back-transliteration the system is able to produce correct, or phonetically e q u i v a l e n t t o c o r r e c t o u t p u t i n approximately 80% of cases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrase-Based Transliteration with Simple Heuristics

This paper presents modeling of transliteration as a phrase-based machine translation system. We used a popular phrasebased machine translation system for English-Hindi machine transliteration. We have achieved an accuracy of 38.1% on the test set. We used some basic rules to modulate the existing phrased-based transliteration system. Our experiments show that phrase-based machine translation s...

متن کامل

Syllable-based Machine Transliteration with Extra Phrase Features

This paper describes our syllable-based phrase transliteration system for the NEWS 2012 shared task on English-Chinese track and its back. Grapheme-based Transliteration maps the character(s) in the source side to the target character(s) directly. However, character-based segmentation on English side will cause ambiguity in alignment step. In this paper we utilize Phrase-based model to solve ma...

متن کامل

A Noisy Channel Model for Grapheme-based Machine Transliteration

Machine transliteration is an important Natural Language Processing task. This paper proposes a Noisy Channel Model for Grapheme-based machine transliteration. Moses, a phrase-based Statistical Machine Translation tool, is employed for the implementation of the system. Experiments are carried out on the NEWS 2009 Machine Transliteration Shared Task English-Chinese track. EnglishChinese back tra...

متن کامل

A Bayesian model of bilingual segmentation for transliteration

In this paper we propose a novel Bayesian model for unsupervised bilingual character sequence segmentation of corpora for transliteration. The system is based on a Dirichlet process model trained using Bayesian inference through blocked Gibbs sampling implemented using an efficient forward filtering/backward sampling dynamic programming algorithm. The Bayesian approach is able to overcome the o...

متن کامل

Bilingual Dictionary Construction with Transliteration Filtering

In this paper we present a bilingual transliteration lexicon of 170K Japanese-English technical terms in the scientific domain. Translation pairs are extracted by filtering a large list of transliteration candidates generated automatically from a phrase table trained on parallel corpora. Filtering uses a novel transliteration similarity measure based on a discriminative phrase-based machine tra...

متن کامل

Transliteration of Name Entity via Improved Statistical Translation on Character Sequences

Transliteration of given parallel name entities can be formulated as a phrase-based statistical machine translation (SMT) process, via its routine procedure comprising training, optimization and decoding. In this paper, we present our approach to transliterating name entities using the loglinear phrase-based SMT on character sequences. Our proposed work improves the translation by using bidirec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008