Robust LTS rules with the Combilex speech technology lexicon
نویسندگان
چکیده
Combilex is a high quality pronunciation lexicon, aimed at speech technology applications, that has recently been released by CSTR. Combilex benefits from several advanced features. This paper evaluates one of these: the explicit alignment of phones to graphemes in a word. This alignment can help to rapidly develop robust and accurate letter-to-sound (LTS) rules, without needing to rely on automatic alignment methods. To evaluate this, we used Festival’s LTS module, comparing its standard automatic alignment with Combilex’s explicit alignment. Our results show using Combilex’s alignment improves LTS accuracy: 86.50% words correct as opposed to 84.49%, with our most general form of lexicon. In addition, building LTS models is greatly accelerated, as the need to list allowed alignments is removed. Finally, loose comparison with other studies indicates Combilex is a superior quality lexicon in terms of consistency and size.
منابع مشابه
On generating combilex pronunciations via morphological analysis
Combilex is a high quality lexicon that has been developed specifically for speech technology purposes and recently released by CSTR. Combilex benefits from many advanced features. This paper explores one of these: the ability to generate fully-specified transcriptions for morphologically derived words automatically. This functionality was originally implemented to encode the pronunciations of ...
متن کاملRedundancy and productivity in the speech technology lexicon - can we do better?
Current lexica for speech technology typically contain much redundancy, while omitting useful information. A comparison with lexica in other media and for other purposes is instructive, as it highlights some features we may borrow for text-to-speech and speech recognition lexica. We describe some aspects of the new lexicon we are producing, Combilex, whose structure and implementation is specif...
متن کاملBoostrapping phonetic lexicons for new languages
Although phonetic lexicons are critical for many speech applications, the process of building one for a new language can take a significant amount of time and effort. We present a bootstrapping algorithm to build phonetic lexicons for new languages. Our method relies on a large amount of unlabeled text, a small set of ’seed words’ with their phonetic transcription, and the proficiency of a nati...
متن کاملA Phonetic Morpheme Lexicon for German
3. PURPOSES The availability of computerized lexical data is growing. In spite of oriented areas and basic research. The most obvious technical this fact, little resources are available for the minimal functional units applications are text-to-speech synthesis (TTS) and automatic speech of language: morphemes. For German several morpheme lexica recognition (ASR). provide morphemes in orthograph...
متن کاملIssues in building general letter to sound rules
In general text-to-speech systems, it is not possible to guarantee that a lexicon will contain all words found in a text, therefore some system for predicting pronunciation from the word itself is necessary. Here we present a general framework for building letter to sound (LTS) rules from a word list in a language. The technique can be fully automatic, though a small amount of hand seeding can ...
متن کامل