A Novel Prosody Adaptation Method for Mandarin Concatenation- Based Text-to-speech System

نویسندگان

  • Jian Yu
  • Jianhua Tao
چکیده

The paper presents a prosody adaptation method which is able to adapt the prosody model of text to speech (TTS) to a new style with a small training corpus. Unlike the conventional prosody mapping between two parallel prosody features, the paper tries to integrate the prosody conversion into the prosody generation model of TTS. In the paper, we use a template based prosody model which consists of two major parts: the prosody template library and the template parameter prediction trees for TTS system. With this model, the prosody adaptation is realized by the following two steps: converting the prosody template library to the target speaker’s prosody based on the mapping methods, retraining prosody prediction trees with the small target training set. In the model, some transformation algorithms, including linear regression, Gaussian Mixture Model (GMM) and Classification and Regression Tree (CART) are involved. Experimental results show that the prosody adaptation system can generate synthesized speech which is much similar with the target speaker.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A novel hybrid approach for Mandarin speech synthesis

The paper investigates a new method to solve concatenation problems of Mandarin speech synthesis which is based on the hybrid approach of HMM-based speech synthesis and unit selection. Unlike other works which use only boundary F0 errors as concatenation cost, a CART based F0 dependency model which considers much context information is trained to measure smoothness of F0. Instead of phoneme-siz...

متن کامل

PKU Mandarin Speech Synthesis System for Blizzard 2009

This paper describes the development of PKU mandarin speech synthesis system for Blizzard Challenge 2009, which is built in the framework of corpus-based unit concatenation synthesis. The system employs a trainable VTR model named HTM to label the VTR trajectories in corpus and predict the target VTR features. In addition, a CART based prosody model is built to predict the prosody parameters of...

متن کامل

The WISTON Text to Speech System for Blizzard 2008

The WISTON system is a large corpus based TTS system with the unit selection method. The text analysis part of this system contains text pre-processing, word segmentation, POS tagging, phonetic transcription and prosody structure prediction. The prosody information (duration, F0, energy) is predicted by the CART model with the input context information. In the unit selection model, we use the m...

متن کامل

Pitch Prediction for Mandarin TTS with Mutual Prosodic Constraint

Most of current pitch prediction methods for mandarin TTS try to get pitch contours from the contextual information with a group of weights assigning. Without a good method in prosody concatenation constraint, the predicted pitch contours are not always stable because of the incomplete accordance between prosody information and text information. The paper presents a new mandarin pitch predictio...

متن کامل

Unsupervised prosody labeling for constructing Mandarin TTS

This paper introduces an unsupervised prosody labeling method for preparing a large speech corpus used in developing a Mandarin Text-to-Speech system. Adopting a four-layer prosody hierarchy, the proposed method performs an unsupervised segmental clustering that iteratively segments spoken utterances into strings of prosodic constituents and models the patterns of the segmented prosodic constit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008