A system for transforming the emotion in speech: combining data-driven conversion techniques for prosody and voice quality

نویسندگان

  • Zeynep Inanoglu
  • Steve J. Young
چکیده

This paper describes a system that combines independent transformation techniques to endow a neutral utterance with some required target emotion. The system consists of three modules that are each trained on a limited amount of speech data and act on differing temporal layers. F0 contours are modelled and generated using context-sensitive syllable HMMs, while durations are transformed using phone-based relative decision trees. For spectral conversion which is applied at the segmental level, two methods were investigated: a GMM-based voice conversion approach and a codebook selection approach. Converted test data were evaluated for three emotions using an independent emotion classifier as well as perceptual listening tests. The listening test results show that perception of sadness output by our system was comparable with the perception of human sad speech while the perception of surprise and anger was around 5% worse than that of a human speaker.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

On the limitations of voice conversion techniques in emotion identification tasks

The growing interest in emotional speech synthesis urges effective emotion conversion techniques to be explored. This paper estimates the relevance of three speech components (spectral envelope, residual excitation and prosody) for synthesizing identifiable emotional speech, in order to be able to customize voice conversion techniques to the specific characteristics of each emotion. The analysi...

متن کامل

On the limitations of voice conversion techniques in emotion identification tasks

The growing interest in emotional speech synthesis urges effective emotion conversion techniques to be explored. This paper estimates the relevance of three speech components (spectral envelope, residual excitation and prosody) for synthesizing identifiable emotional speech, in order to be able to customize the voice conversion techniques to the specific characteristics of each emotion. The ana...

متن کامل

Data-driven emotion conversion in spoken English

This paper describes an emotion conversion system that combines independent parameter transformation techniques to endow a neutral utterance with a desired target emotion. A set of prosody conversion methods have been developed which utilise a small amount of expressive training data ( 15 min) and which have been evaluated for three target emotions: anger, surprise and sadness. The system perfo...

متن کامل

Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems

This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007