Linguistic Processor Training on Speaker Data for Unit Selection Text-to-Speech
نویسنده
چکیده
This paper describes an approach to synthesizing personalized speech while maintaining not only speaker voice but also speaker pronunciation peculiarities. Personalization is realized by means of pronunciation models trained on speaker data contained in his/her speech database. Untrained models allow to synthesize speech in neutral normative style. On the segmental level, the transcription model is used. On the prosodic level, models for phrasing, intonation, pause and phoneme duration are used. These prosodic models are derived from comparative acoustic-phonetic study of different speakers data contained in several speech corpora and databases. Personalizing of pronunciation models is carried out during the off-line training of linguistic processor using a speech database annotation. During the on-line speech synthesis mode, personalized pronunciation models are used by the linguistic processor to generate speaker specific target specification of input text.
منابع مشابه
Unit Selection Speech Synthesis Using Phonetic-Prosodic Description of Speech Databases
This paper describes an approach to speech synthesis based on using speech databases at different stages of TTS process. Speech database units are phones in different segmental and prosodic contexts. Pitch synchronous segmentation and labeling of databases allows storing both segmental and prosodic information. Phonetic-prosodic annotations of speech databases are involved in off-line training ...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملExploiting Alternatives for Text-To-Speech Synthesis: From Machine to Human
The absence of alternatives/variants is a dramatical limitation of text-tospeech synthesis compared to the variety of human speech. This paper introduces the use of speech alternatives/variants in order to improve text-to-speech synthesis systems. Speech alternatives denote the variety of possibilities that a speaker has to pronounce a sentence depending on linguistic constraints, specific stra...
متن کامل