Morphological decomposition in Arabic ASR systems
نویسندگان
چکیده
In recent years, the use of morphological decomposition strategies for Arabic Automatic Speech Recognition (ASR) has become increasingly popular. Systems trained on morphologically decomposed data are often used in combination with standard word-based approaches, and they have been found to yield consistent performance improvements. The present article contributes to this ongoing research endeavour by exploring the use of the ‘Morphological Analysis and Disambiguation for Arabic’ (MADA) tools for this purpose. System integration issues concerning language modelling and dictionary construction, as well as the estimation of pronunciation probabilities, are discussed. In particular, a novel solution for morpheme-to-word conversion is presented which makes use of an N-gram Statistical Machine Translation (SMT) approach. System performance is investigated within a multi-pass adaptation/combination framework. All the systems described in this paper are evaluated on an Arabic large vocabulary speech recognition ∗F. Diehl Email address: {fd257, mjfg, mt126, pcw}@eng.cam.ac.uk (F. Diehl, M.J.F. Gales, M. Tomalin, and P.C. Woodland) Preprint submitted to Elsevier August 25, 2011 task which includes both Broadcast News and Broadcast Conversation test data. It is shown that the use of MADA-based systems, in combination with word-based systems, can reduce the Word Error Rates by up to 8.1% relative.
منابع مشابه
Handling OOV words in Arabic ASR via flexible morphological constraints
We propose a novel framework to detect and recognize outof-vocabulary (OOV) words in automated speech recognition (ASR). In the proposed framework a hybrid language model combining words and sub-word units is incorporated during ASR decoding then three different OOV words recognition methods are applied to generate OOV word hypotheses. Specifically, dictionary lookup, morphological composition,...
متن کاملOn the use of morphological analysis for dialectal Arabic speech recognition
Arabic has a large number of affixes that can modify a stem to form words. In automatic speech recognition (ASR) this leads to a high out-of-vocabulary (OOV) rate for typical lexicon size, and hence a potential increase in WER. This is even more pronounced for dialects of Arabic where additional affixes are often introduced and the available data is typically sparse. To address this problem we ...
متن کاملMorphological analysis and decomposition for Arabic speech-to-text systems
Language modelling for a morphologically complex language such as Arabic is a challenging task. Its agglutinative structure results in data sparsity problems and high out-of-vocabulary rates. In this work these problems are tackled by applying the MADA tools to the Arabic text. In addition to morphological decomposition, MADA performs context-dependent stem-normalisation. Thus, if word-level sy...
متن کاملCritique of Arabic Texts/ Critical Review of Al-Adab al-Arabi Min-Asr al-Jahili, Ali Ghahramani and Masoud Bavanpour
متن کامل
Morphological Decomposition for Asr in German
In this contribution we report on our ongoing work in lexical decomposition for automatic speech recognition (ASR). Lexical decomposition is investigated with a twofold goal: lexical coverage optimization and improved automatic letter-tosound conversion. Whereas morphological decomposition is a widely-studied domain in linguistics, our interest is limited here to identifying and processing the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer Speech & Language
دوره 26 شماره
صفحات -
تاریخ انتشار 2012