Hmm-based Expressive Speech Synthesis —towards Tts with Arbitrary Speaking Styles and Emotions

نویسندگان

  • Junichi Yamagishi
  • Takashi Masuko
  • Takao Kobayashi
چکیده

This paper describes recent progress in our approach to generating expressive speech. A goal of text-to-speech (TTS) synthesis is to have an ability to generate natural sounding speech with arbitrary speaker’s voice characteristics, speaking styles and emotional expressions. To change voice and speaking style and/or emotion of the synthetic speech arbitrarily with maintaining its naturalness, it is required that prosodic features as well as spectral features are controlled properly. Since prosodic features are more or less related to spectral features, it is desirable to control these features simultaneously taking account of the relationship between spectrum and prosody. To resolve this problem, we have proposed several key ideas which include speaking style interpolation and adaptation for HMM-based speech synthesis. This paper focuses on these ideas and provides an overview of our approach. Moreover we show experimental results which show the effectiveness of the approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recent Development of HMM-Based Expressive Speech Synthesis and Its Applications

This paper describes the recent development of HMM-based expressive speech synthesis. Although the expressive speech includes a wide variety of expressions such as emotions, speaking styles, intention, attitude, emphasis, focus, and so on, we mainly refer to the speech synthesis techniques for emotions and speaking styles, which would be the most primary expressions in human speech communicatio...

متن کامل

Expressive speech synthesis in MARY TTS using audiobook data and emotionML

This paper describes a framework for synthesis of expressive speech based on MARY TTS and Emotion Markup Language (EmotionML). We describe the creation of expressive unit selection and HMM-based voices using audiobook data labelled according to voice styles. Audiobook data is labelled/split according to voice styles by principal component analysis (PCA) of acoustic features extracted from segme...

متن کامل

A Corpus-based Approach to <ahem/> Expressive Speech Synthesis

Human speech communication can be thought of as comprising two channels – the words themselves, and the style in which they are spoken. Each of these channels carries information. Today's most-advanced text-to-speech (TTS) systems such as [1],[2],[3],[4] fall far short of human speech because they offer only a single, fixed style of delivery, independent of the message. In this paper, we descri...

متن کامل

Prediction of Emotions from Text using Sentiment Analysis for Expressive Speech Synthesis

The generation of expressive speech is a great challenge for text-to-speech synthesis in audiobooks. One of the most important factors is the variation in speech emotion or voice style. In this work, we developed a method to predict the emotion from a sentence so that we can convey it through the synthetic voice. It consists of combining a standard emotion-lexicon based technique with the polar...

متن کامل

MARY TTS HMM - based voices for the Blizzard Challenge 2012

This paper describes the first participation of MARY TTS HMM-based voices in a Blizzard challenge. An architecture for synthesis of expressive speech based on the MARY TTS system and sentiment analysis of text is proposed. The creation of several HMM-based voices in different styles using audiobook data is described. Preliminary results on perception of different voice styles and the appropriat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003