A comparison of voice conversion methods for transforming voice quality in emotional speech synthesis

نویسندگان

  • Oytun Türk
  • Marc Schröder
چکیده

This paper presents a comparison of methods for transforming voice quality in neutral synthetic speech to match cheerful, aggressive, and depressed expressive styles. Neutral speech is generated using the unit selection system in the MARY TTS platform and a large neutral database in German. The output is modified using voice conversion techniques to match the target expressive styles, the focus being on spectral envelope conversion for transforming the overall voice quality. Various improvements over the state-of-the-art weighted codebook mapping and GMM based voice conversion frameworks are employed resulting in three algorithms. Objective evaluation results show that all three methods result in comparable reduction in objective distance to target expressive TTS outputs whereas weighted frame mapping and GMM based transformations were perceived slightly better than the weighted codebook mapping outputs in generating the target expressive style in a listening test.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems

This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...

متن کامل

طراحی یک روش آموزش ناموازی جدید برای تبدیل گفتار با عملکردی بهتر از آموزش موازی

Introduction: The art of voice mimicking by computers, has with the computer have been one of the most challenging topics of speech processing in recent years. The system of voice conversion has two sides. In one side, the speaker is the source that his or her voice has been changed for mimicking the target speaker’s voice (which is on the other side). Two methods of p...

متن کامل

Comparing the Voice Handicap Index Scores in Groups with Structural and Functional Voice Disorders

Objective: The effects of voice disorders vary from person to person. Occupation, work environment, life, and family reaction are variables that affect one’s perception of his/her own as an impaired voice. Voice Handicap Index (VHI) has not yet been used to compare the degree of voice disorders. Assuming that the quality of life may be different under a variety of voice disorders and that diffe...

متن کامل

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

On the limitations of voice conversion techniques in emotion identification tasks

The growing interest in emotional speech synthesis urges effective emotion conversion techniques to be explored. This paper estimates the relevance of three speech components (spectral envelope, residual excitation and prosody) for synthesizing identifiable emotional speech, in order to be able to customize the voice conversion techniques to the specific characteristics of each emotion. The ana...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008