Amharic-English Speech Translation in Tourism Domain
نویسندگان
چکیده
This paper describes speech translation from Amharic-to-English, particularly Automatic Speech Recognition (ASR) with post-editing feature and AmharicEnglish Statistical Machine Translation (SMT). ASR experiment is conducted using morpheme language model (LM) and phoneme acoustic model (AM). Likewise, SMT conducted using word and morpheme as unit. Morpheme based translation shows a 6.29 BLEU score at a 76.4% of recognition accuracy while word based translation shows a 12.83 BLEU score using 77.4% word recognition accuracy. Further, after post-edit on Amharic ASR using corpus based n-gram, the word recognition accuracy increased by 1.42%. Since post-edit approach reduces error propagation, the word based translation accuracy improved by 0.25 (1.95%) BLEU score. We are now working towards further improving propagated errors through different algorithms at each unit of speech translation cascading component.
منابع مشابه
Amharic-English Information Retrieval
We describe Amharic-English cross lingual information retrieval experiments in the adhoc bilingual tracs of the CLEF 2006. The query analysis is supported by morphological analysis and part of speech tagging while we used different machine readable dictionaries for term lookup in the translation process. Out of dictionary terms were handled using fuzzy matching and Lucene[4] was used for indexi...
متن کاملLanguage Model Data Augmentation for Keyword Spotting in Low-Resourced Training Conditions
This research extends our earlier work on using machine translation (MT) and word-based recurrent neural networks to augment language model training data for keyword search in conversational Cantonese speech. MT-based data augmentation is applied to two language pairs: English-Lithuanian and English-Amharic. Using filtered N-best MT hypotheses for language modeling is found to perform better th...
متن کاملPreliminary experiments on English-Amharic statistical machine translation
This paper discusses the preliminary experiment conducted to translate from English to Amharic using the Statistical Machine Translation (EASMT) approach. The experiment on the EASMT system is being conducted on training corpus of both languages based on expressions that are found in parallel documents. The experiment involves collecting of a total of 632 Parliamentary corpora of which 115 have...
متن کاملWeb Mining for an Amharic - English Bilingual Corpus
We present recent work aimed at constructing a bilingual corpus consisting of comparable Amharic and English news texts. The Amharic and English texts were collected from an Ethiopian news agency that publishes daily news in Amharic and English through their web page. The Amharic texts are represented using Ethiopic script and archived according to the Ethiopian calender. The overlap between th...
متن کاملData-driven Amharic-English Bilingual Lexicon Acquisition
This paper describes a simple approach of statistical language modelling for bilingual lexicon acquisition from Amharic-English parallel corpora. The goal is to induce a seed translation lexicon from sentence-aligned corpora. The seed translation lexicon contains matches of Amharic lexemes to weekly inflected English words. Purely statistical measures of term distribution are used as the basis ...
متن کامل