Tackling Close Cousins: Experiences In Developing Statistical Machine Translation Systems For Marathi And Hindi

نویسندگان

  • Raj Dabre
  • Jyotesh Choudhari
  • Pushpak Bhattacharyya
چکیده

In this paper we present our experiences in building Statistical Machine Translation (SMT) systems for the Indian Language pair Marathi and Hindi, which are close cousins. We briefly point out the similarities and differences between the two languages, stressing on the phenomenon of Krudantas (Verb Groups) translation, which is something Rule based systems are not able to do well. Marathi, being a language with agglutinative suffixes, poses a challenge due to lack of coverage of all word forms in the corpus; to remedy which, we explored Factored SMT, that incorporate linguistic analyses in a variety of ways. We evaluate our systems and through error analyses, show that even with small size corpora we can get substantial improvement of approximately 10-15% in translation quality, over the baseline, just by incorporating morphological analysis. We also indirectly evaluate our SMT systems by analysing and reporting the improvement in the quality of translations of a Marathi to Hindi Rule Based system (Sampark) by injecting SMT translations of Krudantas. We believe that our work will help researchers working with limited corpora on similar morphologically rich language pairs and relatable phenomena to develop quality MT systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of SMT and RBMT, The Requirement of Hybridization for Marathi – Hindi MT

We present in this paper our work on comparison between Statistical Machine Translation (SMT) and Rule-based machine translation for translation from Marathi to Hindi. Rule Based systems although robust take lots of time to build. On the other hand statistical machine translation systems are easier to create, maintain and improve upon. We describe the development of a basic Marathi-Hindi SMT sy...

متن کامل

Comparison of SMT and RBMT; The Requirement of Hybridization for Marathi-Hindi MT

We present in this paper our work on comparison between Statistical Machine Translation (SMT) and Rule-based machine translation for translation from Marathi to Hindi. Rule Based systems although robust take lots of time to build. On the other hand statistical machine translation systems are easier to create, maintain and improve upon. We describe the development of a basic Marathi-Hindi SMT sy...

متن کامل

PanchBhoota: Hierarchical Phrase Based Machine Translation Systems for Five Indian Languages

We present our work on developing fifteen Hierarchical Phrase Based Statistical Machine Translation (HPBSMT) systems for five Indian language pairs namely Bengali-Hindi, EnglishHindi, Marathi-Hindi, Tamil-Hindi, and Telugu-Hindi, in three domains each, HEALTH, TOURISM and GENERAL. We named them PanchBhoota, as these systems are elemental in nature. We used a very simple approach to train, tune,...

متن کامل

SMT from Agglutinative Languages: Use of Suffix Separation and Word Splitting

Marathi and Hindi both being Indo-Aryan family members and using Devanagari script are similar to a great extent. Both follow SOV sentence structure and are equally liberal in word order. The translation for this language pair appears to be easy. But experiments show this to be a significantly difficult task, primarily due to the fact that Marathi is morphologically richer compared to Hindi. We...

متن کامل

Bilingual Words and Phrase Mappings for Marathi and Hindi SMT

Lack of proper linguistic resources is the major challenges faced by the Machine Translation system developments when dealing with the resource poor languages. In this paper, we describe effective ways to utilize the lexical resources to improve the quality of statistical machine translation. Our research on the usage of lexical resources mainly focused on two ways, such as; augmenting the para...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014