Cross-lingual acoustic modeling for dialectal Arabic speech recognition

نویسندگان

  • Mohamed Elmahdy
  • Rainer Gruhn
  • Wolfgang Minker
  • Slim Abdennadher
چکیده

Amajor problem with dialectal Arabic acoustic modeling is due to the very sparse available speech resources. In this paper, we have chosen Egyptian Colloquial Arabic (ECA) as a typical dialect. In order to benefit from existing Modern Standard Arabic (MSA) resources, a cross-lingual acoustic modeling approach is proposed that is based on supervised model adaptation. MSA acoustic models were adapted using MLLR and MAP with an in-house collected ECA corpus. Phoneme-based and graphemebased acoustic modeling were investigated. To make phonemebased adaptation feasible, we have normalized the phoneme sets of MSA and ECA. Since dialectal Arabic is mainly spoken, graphemic form usually does not match actual spelling as in MSA, a graphemic MSA acoustic model was used to force align and to choose the correct ECA spelling from a set of automatically generated spelling variants lexicon. Results show that the adapted MSA acoustic models outperformed acoustic models trained with only ECA data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Diacritization Of Arabic For Acoustic Modeling In Speech Recognition

Automatic recognition of Arabic dialectal speech is a challenging task because Arabic dialects are essentially spoken varieties. Only few dialectal resources are available to date; moreover, most available acoustic data collections are transcribed without diacritics. Such a transcription omits essential pronunciation information about a word, such as short vowels. In this paper we investigate v...

متن کامل

A Transfer Learning Approach for Under-Resourced Arabic Dialects Speech Recognition

A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, we propose a transfer learning framework to jointly use large amount of Modern Standard Arabic (MSA) data and little amount of dialectal Arabic data to improve acoustic and language modeling. We have chosen the Qatari Arabic (QA) dialect as a typical example for an under-resourced...

متن کامل

Development of a TV Broadcasts Speech Recognition System for Qatari Arabic

A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, a transfer learning framework is proposed to jointly use a large amount of Modern Standard Arabic (MSA) data and little amount of dialectal Arabic data to improve acoustic and language modeling. The Qatari Arabic (QA) dialect has been chosen as a typical example for an under-resou...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition

Dialectal Arabic speech recognition is a difficult problem and is relatively less studied. In this paper, we propose a cross-dialectal Gaussian mixture model training criteria to transfer knowledge from one domain to the other by data sharing. Specifically, phone classification experiments on West Point Modern Standard Arabic Speech corpus and Babylon Levantine Arabic Speech corpus demonstrate ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010