Discriminative pronunciation modeling for dialectal speech recognition
نویسندگان
چکیده
Speech recognizers are typically trained with data from a standard dialect and do not generalize to non-standard dialects. Mismatch mainly occurs in the acoustic realization of words, which is represented by acoustic models and pronunciation lexicon. Standard techniques for addressing this mismatch are generative in nature and include acoustic model adaptation and expansion of lexicon with pronunciation variants, both of which have limited effectiveness. We present a discriminative pronunciation model whose parameters are learned jointly with parameters from the language models. We tease apart the gains from modeling the transitions of canonical phones, the transduction from surface to canonical phones, and the language model. We report experiments on African American Vernacular English (AAVE) using NPR’s StoryCorps corpus. Our models improve the performance over the baseline by about 2.1% on AAVE, of which 0.6% can be attributed to the pronunciation model. The model learns the most relevant phonetic transformations for AAVE speech.
منابع مشابه
Using a small development set to build a robust dialectal Chinese speech recognizer
To make full use of a small development data set to build a robust dialectal Chinese speech recognizer from a standard Chinese speech recognizer (based on Chinese Initial/Final, IF), a novel, simple but effective acoustic modeling method, named state-dependent phoneme-based model merging (SDPBMM), is proposed and evaluated, where a shared-state of standard tri-IF is merged with a state of diale...
متن کاملDiscriminative Pronunciation Modeling Using the MPE Criterion
Introducing pronunciation models into decoding has been proven to be benefit to LVCSR. In this paper, a discriminative pronunciation modeling method is presented, within the framework of the Minimum Phone Error (MPE) training for HMM/GMM. In order to bring the pronunciation models into the MPE training, the auxiliary function is rewritten at word level and decomposes into two parts. One is for ...
متن کاملDiscriminative pronunciation modeling based on minimum phone error training
Introducing pronunciation models into decoding has proven beneficial for LVCSR. As Minimum Phone Error (MPE) training has almost become a standard scheme for acoustic modeling, a discriminative pronunciation modeling method is investigated under the framework of MPE training. In order to bring the pronunciation models into MPE training, the auxiliary function of MPE training is rewritten at wor...
متن کاملAutomatic Diacritization Of Arabic For Acoustic Modeling In Speech Recognition
Automatic recognition of Arabic dialectal speech is a challenging task because Arabic dialects are essentially spoken varieties. Only few dialectal resources are available to date; moreover, most available acoustic data collections are transcribed without diacritics. Such a transcription omits essential pronunciation information about a word, such as short vowels. In this paper we investigate v...
متن کاملDiscriminative Pronunciation Learning Using Phonetic Decoder and Minimum-classification-error Criterion
In this paper, we report our recent research aimed at improving the pronunciation-modeling component of a speech recognition system designed for mobile voice search. Our new discriminative learning technique overcomes the limitation of the traditional ways of introducing alternative pronunciations that often enlarge confusability across different lexical items. Instead, we make use of a phoneti...
متن کامل