Phonetic labeling and segmentation of mixed-lingual prosody databases
نویسندگان
چکیده
An automatic system for segmenting speech signals used for the training of statistical prosody models is presented. Starting from a canonical transcription, the system simultaneously delivers an accurate phonetic segmentation and the matched phonetic transcription indicating pronunciation variants. Although the system is HMM-based, it uses only the speech signals of the prosody database which typically consists of a few hundred sentences with some 30 minutes total duration. Initial phone HMMs are generated with flat-start training using the canonical transcriptions of the sentences. Then iterative Viterbi search for best-matching pronunciation variants and HMM retraining is applied until convergence is attained.
منابع مشابه
Polyglot speech prosody control
Within a polyglot text-to-speech synthesis system, the generation of an adequate prosody for mixed-lingual texts, sentences, or even words, requires a polyglot prosody model that is able to seamlessly switch between languages and that applies the same voice for all languages. This paper presents the first polyglot prosody model that fulfills these requirements and that is constructed from indep...
متن کاملCross - Lingual Voice Conversion
CROSS-LINGUAL VOICE CONVERSION Cross-lingual voice conversion refers to the automatic transformation of a source speaker’s voice to a target speaker’s voice in a language that the target speaker can not speak. It involves a set of statistical analysis, pattern recognition, machine learning, and signal processing techniques. This study focuses on the problems related to cross-lingual voice conve...
متن کاملAutomatic analysis of prosody for multi - lingual speech corpora . Daniel Hirst
This chapter outlines a general approach and describes a set of tools for the automatic analysis of multilingual speech corpora. Two levels of representation can be derived automatically: a phonetic representation, which provides an extremely close copy of the original speech signal, and a surface phonological representation, which reduces the variability to a small number of discrete values wi...
متن کاملNon-Native Text-to-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics
This paper presents a novel non-native speech synthesis technique that preserves the individuality of a non-native speaker. Crosslingual speech synthesis based on voice conversion or Hidden Markov Model (HMM)-based speech synthesis is a technique to synthesize foreign language speech using a target speaker’s natural speech uttered in his/her mother tongue. Although the technique holds promise t...
متن کاملAutomatic Labeling of Corpora for Speech
One of the bottlenecks in the development of text-to-speech synthesizers based on segment concatenation is the need for large, segmented and labeled corpora. Consequently, as manual segmentation and labeling is a tedious and time consuming task, there is a strong demand for automatic labeling systems which can label speech from many languages. Several systems have been proposed already, but the...
متن کامل