Combining linguistic knowledge and acoustic information in automatic pronunciation lexicon generation
نویسندگان
چکیده
This paper describes several experiments aimed at the long term goal of enabling a spoken conversational system to automatically improve its pronunciation lexicon over time through direct interactions with end users and from available Web sources. We selected a set of 200 rare words from the OGI corpus of spoken names, and performed several experiments combining spelling and pronunciation information to hypothesize phonemic baseforms for these words. We evaluated the quality of the resulting baseforms through a series of recognition experiments, using the 200 words in an isolated word recognition task. We also report here on a modification to our letter-to-sound system, utilizing a letter-phoneme n-gram language model, either alone or in combination with our original “column-bigram” model, for additional linguistic constraint and robustness. Our experiments confirm our expectation that acoustic information drawn from spoken examples of the words can greatly improve the quality of the baseforms, as measured by the recognition error rate. Our ultimate goal is to allow a spoken dialogue system to automatically expand and improve its baseforms over time as users introduce new words or supply spoken pronunciations of existing words.
منابع مشابه
Pronunciation Lexicon Development for Under-Resourced Languages Using Automatically Derived Subword Units: A Case Study on Scottish Gaelic
Developing a phonetic lexicon for a language requires linguistic knowledge as well as human effort, which may not be available, particularly for under-resourced languages. To avoid the need for the linguistic knowledge, acoustic information can be used to automatically obtain the subword units and the associated pronunciations. Towards that, the present paper investigates the potential of a rec...
متن کاملGrapheme-based Automatic Speech Recognition using Probabilistic Lexical Modeling
Automatic speech recognition (ASR) systems incorporate expert knowledge of language or the linguistic expertise through the use of phone pronunciation lexicon (or dictionary) where each word is associated with a sequence of phones. The creation of phone pronunciation lexicon for a new language or domain is costly as it requires linguistic expertise, and includes time and money. In this thesis, ...
متن کاملPronouncUR: An Urdu Pronunciation Lexicon Generator
State-of-the-art speech recognition systems rely heavily on three basic components: an acoustic model, a pronunciation lexicon and a language model. To build these components, a researcher needs linguistic as well as technical expertise, which is a barrier in lowresource domains. Techniques to construct these three components without having expert domain knowledge are in great demand. Urdu, des...
متن کاملUnsupervised Joint Estimation of Grapheme-to-Phoneme Conversion Systems and Acoustic Model Adaptation for Non-Native Speech Recognition
Non-native speech differs significantly from native speech, often resulting in a degradation of the performance of automatic speech recognition (ASR). Hand-crafted pronunciation lexicons used in standard ASR systems generally fail to cover non-native pronunciations, and design of new ones by linguistic experts is time consuming and costly. In this work, we propose acoustic data-driven iterative...
متن کاملDiscriminative Pronunciation Modeling Using the MPE Criterion
Introducing pronunciation models into decoding has been proven to be benefit to LVCSR. In this paper, a discriminative pronunciation modeling method is presented, within the framework of the Minimum Phone Error (MPE) training for HMM/GMM. In order to bring the pronunciation models into the MPE training, the auxiliary function is rewritten at word level and decomposes into two parts. One is for ...
متن کامل