Automatic Learning and Optimization of Pronunciation Dictionaries
نویسندگان
چکیده
Pronunciation dictionaries are the interface between orthographic and phonetic representation of the speech signal and are thereby a substantial component of speech recognition systems. In many systems simple canonical pronunciation forms are used within the dictionary. They represent the “correct” pronunciation as they are found in lexicons and neither contain the most frequent pronunciation nor pronunciation variations. Often canonical dictionaries are manually extended by pronunciation variants to improve the performance of the dictionary. This is a time consuming process that depends on the skill and expert knowledge of the scientist. Another popular approach utilizes rules for the generation of pronunciation variants. Unless the rule set contains a large quantity of specialized rules with a very long context, the major disadvantage of this approach lies in its tendency to overgeneralize. Also rule sets often do not provide information on the probability of usage of the rules. A third frequent approach is the data-driven generation of rules sets or pronunciation models. These methods try to eliminate manual work (and human errors) and allow a scalable degree of generalization as well as the estimation of rule application or variant probabilities. In this paper we introduce a method for automatic training of pronunciation dictionaries from a speech database. We will give a overview of our training procedure and discuss experimental results.
منابع مشابه
Learning Pronunciation Rules for English Graphemes Using the Version Space Algorithm
We describe a technique for learning pronunciation rules based on the Version Space algorithm. In particular, we describe how to learn pronunciation rules for a representative subset of the English graphemes. We present a learning procedure called LEP-G.1 (learning to pronounce English graphemes) that learns English pronunciation rules from examples in the form of word-pronunciation pairs. With...
متن کاملAutomatic segmentation and clustering of speech using sparse coding
We investigate the application of sparse coding and dictionary learning to the discovery of sub-word units in speech. The ultimate goal is to generate pronunciation dictionaries that could be used for automatic speech recognition (ASR). A dictionary of sparse coding atoms is trained to code a subset of the TIMIT corpus. Some of the trained units exhibit strong correlation with specific referenc...
متن کاملEfficient compression method for pronunciation dictionaries
Pronunciation dictionaries are often used with other datadriven methods to model the pronunciations in phonemebased automatic speech recognition (ASR) and text-to-speech (TTS) systems. The dictionaries usually take a great amount of memory, which is a limiting factor in portable handheld devices. Compressing the pronunciation dictionaries results in minimal transmission bandwidth and less stora...
متن کاملPair Language Models for Deriving Alternative Pronunciations and Spellings from Pronunciation Dictionaries
Pronunciation dictionaries provide a readily available parallel corpus for learning to transduce between character strings and phoneme strings or vice versa. Translation models can be used to derive character-level paraphrases on either side of this transduction, allowing for the automatic derivation of alternative pronunciations or spellings. We examine finitestate and SMT-based methods for th...
متن کاملAutomatic Error Recovery for Pronunciation Dictionaries
In this paper, we present our latest investigations on pronunciation modeling and its impact on ASR. We propose completely automatic methods to detect, remove, and substitute inconsistent or flawed entries in pronunciation dictionaries. The experiments were conducted on different tasks, namely (1) word-pronunciation pairs from the Czech, English, French, German, Polish, and Spanish Wiktionary [...
متن کامل