Joint Alignment and Artificial Data Generation: An Empirical Study of Pivot-based Machine Transliteration
نویسندگان
چکیده
In this paper, we first carry out an investigation on two existing pivot strategies for statistical machine transliteration, namely system-based and model-based strategies, to figure out the reason why the previous model-based strategy performs much worse than the system-based one. We then propose a joint alignment algorithm to optimize transliteration alignments jointly across source, pivot and target languages to improve the performance of the modelbased strategy. In addition, we further propose a novel synthetic data-based strategy, which artificially generates source-target data using pivot language. Experimental results on benchmarking data show that the proposed joint alignment optimization algorithm significantly improves the accuracy of model-based strategy and the proposed synthetic data-based strategy is very effective for pivot-based machine transliteration.
منابع مشابه
Transliteration normalization for Information Extraction and Machine Translation
Arabic; Named Entity Recognition; Transliteration; Name normalization; Information Extraction; Machine Translation Abstract Foreign name transliterations typically include multiple spelling variants. These variants cause data sparseness and inconsistency problems, increase the Out-of-Vocabulary (OOV) rate, and present challenges for Machine Translation, Information Extraction and other natural ...
متن کاملMachine Transliteration: Leveraging on Third Languages
This paper presents two pivot strategies for statistical machine transliteration, namely system-based pivot strategy and model-based pivot strategy. Given two independent source-pivot and pivot-target name pair corpora, the model-based strategy learns a direct sourcetarget transliteration model while the system-based strategy learns a sourcepivot model and a pivot-target model, respectively. Ex...
متن کاملThe Application of Bayesian Alignment Techniques to Transliteration Generation and Mining
Bayesian techniques have recently been applied to many areas of natural language processing, and have proven themselves particularly useful in areas involving segmentation and alignment. This paper looks at the direct application of these techniques to the co-segmentation/alignment of grapheme sequences. We detail a novel Bayesian model for unsupervised bilingual character sequence alignment of...
متن کاملBubble Pressure Prediction of Reservoir Fluids using Artificial Neural Network and Support Vector Machine
Bubble point pressure is an important parameter in equilibrium calculations of reservoir fluids and having other applications in reservoir engineering. In this work, an artificial neural network (ANN) and a least square support vector machine (LS-SVM) have been used to predict the bubble point pressure of reservoir fluids. Also, the accuracy of the models have been compared to two-equation stat...
متن کاملA Bayesian model of bilingual segmentation for transliteration
In this paper we propose a novel Bayesian model for unsupervised bilingual character sequence segmentation of corpora for transliteration. The system is based on a Dirichlet process model trained using Bayesian inference through blocked Gibbs sampling implemented using an efficient forward filtering/backward sampling dynamic programming algorithm. The Bayesian approach is able to overcome the o...
متن کامل