A Novel Prosody Adaptation Method for Mandarin Concatenation- Based Text-to-speech System
نویسندگان
چکیده
The paper presents a prosody adaptation method which is able to adapt the prosody model of text to speech (TTS) to a new style with a small training corpus. Unlike the conventional prosody mapping between two parallel prosody features, the paper tries to integrate the prosody conversion into the prosody generation model of TTS. In the paper, we use a template based prosody model which consists of two major parts: the prosody template library and the template parameter prediction trees for TTS system. With this model, the prosody adaptation is realized by the following two steps: converting the prosody template library to the target speaker’s prosody based on the mapping methods, retraining prosody prediction trees with the small target training set. In the model, some transformation algorithms, including linear regression, Gaussian Mixture Model (GMM) and Classification and Regression Tree (CART) are involved. Experimental results show that the prosody adaptation system can generate synthesized speech which is much similar with the target speaker.
منابع مشابه
A novel hybrid approach for Mandarin speech synthesis
The paper investigates a new method to solve concatenation problems of Mandarin speech synthesis which is based on the hybrid approach of HMM-based speech synthesis and unit selection. Unlike other works which use only boundary F0 errors as concatenation cost, a CART based F0 dependency model which considers much context information is trained to measure smoothness of F0. Instead of phoneme-siz...
متن کاملPKU Mandarin Speech Synthesis System for Blizzard 2009
This paper describes the development of PKU mandarin speech synthesis system for Blizzard Challenge 2009, which is built in the framework of corpus-based unit concatenation synthesis. The system employs a trainable VTR model named HTM to label the VTR trajectories in corpus and predict the target VTR features. In addition, a CART based prosody model is built to predict the prosody parameters of...
متن کاملThe WISTON Text to Speech System for Blizzard 2008
The WISTON system is a large corpus based TTS system with the unit selection method. The text analysis part of this system contains text pre-processing, word segmentation, POS tagging, phonetic transcription and prosody structure prediction. The prosody information (duration, F0, energy) is predicted by the CART model with the input context information. In the unit selection model, we use the m...
متن کاملPitch Prediction for Mandarin TTS with Mutual Prosodic Constraint
Most of current pitch prediction methods for mandarin TTS try to get pitch contours from the contextual information with a group of weights assigning. Without a good method in prosody concatenation constraint, the predicted pitch contours are not always stable because of the incomplete accordance between prosody information and text information. The paper presents a new mandarin pitch predictio...
متن کاملUnsupervised prosody labeling for constructing Mandarin TTS
This paper introduces an unsupervised prosody labeling method for preparing a large speech corpus used in developing a Mandarin Text-to-Speech system. Adopting a four-layer prosody hierarchy, the proposed method performs an unsupervised segmental clustering that iteratively segments spoken utterances into strings of prosodic constituents and models the patterns of the segmented prosodic constit...
متن کامل