Sinhala-Tamil Machine Translation: Towards better Translation Quality
نویسندگان
چکیده
Statistical Machine Translation (SMT) is a well-known and well established datadriven approach used for language translation. The focus of this work is to develop a statistical machine translation system for Sri Lankan languages, Sinhala and Tamil language pair. This paper presents a systematic investigation of how SinhalaTamil SMT performance varies with the amount of parallel training data used, in order to find out the minimum needed to develop a machine translation system with acceptable performance.
منابع مشابه
A Statistical Machine Translation Approach to Sinhala-Tamil Language Translation
Data-driven approaches to Machine Translation have come to the fore of Language Processing Research over the past decade. The relative success in terms of robustness of Example Based and Statistical approaches have given rise to a new optimism and an exploration of other data-driven approaches such as Maximum Entropy language modeling. Much of the work in the literature however, largely report ...
متن کاملAutomatic Creation of a Sentence Aligned Sinhala-Tamil Parallel Corpus
A sentence aligned parallel corpus is an important prerequisite in statistical machine translation. However, manual creation of such a parallel corpus is time consuming, and requires experts fluent in both languages. Automatic creation of a sentence aligned parallel corpus using parallel text is the solution to this problem. In this paper, we present the first ever empirical evaluation carried ...
متن کاملThe Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems
Machine translation is one of the major and the most active areas of Natural language processing. Machine translation (MT) is an automatic translation of one natural language into another using computer generated instructions. The utility and power of Statistical Machine Translation (SMT) seems destined to change our technological society in profound and fundamental ways. The current state-of-t...
متن کاملMorphological Processing for English-Tamil Statistical Machine Translation
Various experiments from literature suggest that in statistical machine translation (SMT), applying either pre-processing or post-processing to morphologically rich languages leads to better translation quality. In this work, we focus on the English-Tamil language pair. We implement suffix-separation rules for both of the languages and evaluate the impact of this preprocessing on translation qu...
متن کاملA Computational Grammar of Sinhala for English-sinhala Machine Translation
Communication is fundamental to the evolution and development of all kinds of living beings. With no disputes, languages should be recognized as the most amazing artifacts ever developed by mankind to enable communication. Computer has also become such a unique machine, due to its capacity to communicate with humans through languages. It is worth mentioning that the languages understood by comp...
متن کامل