On the Adequacy of Baseform Pronunciations and Pronunciation Variants

نویسندگان

  • Mathew Magimai-Doss
  • Hervé Bourlard
چکیده

This paper presents an approach to automatically extract and evaluate the “stability” of pronunciation variants (i.e., adequacy of the model to accommodate this variability), based on multiple pronunciations of each lexicon words and the knowledge of a reference baseform pronunciation. Most approaches toward modelling pronunciation variability in speech recognition are based on the inference (through an ergodic HMM model) of a pronunciation graph (including all pronunciation variants), usually followed by a smoothing (e.g., Bayesian) of the resulting graph. Compared to these approaches, the approach presented here differs by (1) the way the models are inferred and (2) the way the smoothing (i.e., keeping the best ones) is done. In our case, indeed, inference of the pronunciation variants is obtained by slowly “relaxing” a (usually left-to-right) baseform model towards a fully ergodic model. In this case, the more stable the model is, the less the inferred model will diverge from it. Hence, for each pronunciation model so generated, we evaluate their adequacy by calculating the Levenshtein distance of the the new model with respect to the baseform, as well as their confidence measure (based on some posterior estimation), and models with the lowest Levenshtein distance and highest confidence are preserved. On a large telephone speech database (Phonebook), we show the relationship between this “stability” measure and recognition performance, and we finally show that automatically adding a few pronunciation variants to the less stable words is enough to significantly improve recognition rates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building multiple pronunciation models for novel words using exploratory computational phonology

In this paper we describe a completely automatic algorithm that builds multiple pronunciation word models by expanding baseform pronunciations with a set of candidate phonological rules. We show how to train the probabilities of these phonological rules, and how to use these probabilities to assign pronunciation probabilities to words not seen in the training corpus. The algorithm we propose is...

متن کامل

Pronunciation ambiguity vs. pronunciation variability in speech recognition

It is widely acknowledged that pronunciations in spontaneous speech di er signi cantly from citation form. For this reason, pronunciation modeling has received considerable attention in recent automatic speech recognition literature. Most of the attention however has focussed on describing an alternate pronunciation as a di erent sequence of phonetic units using the same inventory of phones whi...

متن کامل

Dynamic and static improvements to lexical baseforms

One limitation of many speaker independent recognition systems is their dependence on a single baseform dictionary to model word pronunciations. These dictionaries typically contain only a single (or 'ideal') pronunciation for each word. Previous work on improving dictionary models to include multiple pronunciations has met with mixed success-the alternatives may increase ambiguity in some case...

متن کامل

The roles of reconstruction and lexical storage in the comprehension of regular pronunciation variants

This paper investigates how listeners process regular pronunciation variants, resulting from simple general reduction processes. Study 1 shows that when listeners are presented with new words, they store the pronunciation variants presented to them, whether these are unreduced or reduced. Listeners thus store information on word-specific pronunciation variation. Study 2 suggests that if partici...

متن کامل

Properties of Pronunciation Change in Conversational Speech Recognition

It is widely acknowledged that pronunciations in spontaneous speech differ significantly from citation form. For this reason, pronunciation modeling has received considerable attention in recent automatic speech recognition literature. Most of the attention however has focussed on describing an alternate pronunciation as a different sequence of phonetic units using the same inventory of phones ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004