Hmm-based Expressive Speech Synthesis —towards Tts with Arbitrary Speaking Styles and Emotions
نویسندگان
چکیده
This paper describes recent progress in our approach to generating expressive speech. A goal of text-to-speech (TTS) synthesis is to have an ability to generate natural sounding speech with arbitrary speaker’s voice characteristics, speaking styles and emotional expressions. To change voice and speaking style and/or emotion of the synthetic speech arbitrarily with maintaining its naturalness, it is required that prosodic features as well as spectral features are controlled properly. Since prosodic features are more or less related to spectral features, it is desirable to control these features simultaneously taking account of the relationship between spectrum and prosody. To resolve this problem, we have proposed several key ideas which include speaking style interpolation and adaptation for HMM-based speech synthesis. This paper focuses on these ideas and provides an overview of our approach. Moreover we show experimental results which show the effectiveness of the approach.
منابع مشابه
Recent Development of HMM-Based Expressive Speech Synthesis and Its Applications
This paper describes the recent development of HMM-based expressive speech synthesis. Although the expressive speech includes a wide variety of expressions such as emotions, speaking styles, intention, attitude, emphasis, focus, and so on, we mainly refer to the speech synthesis techniques for emotions and speaking styles, which would be the most primary expressions in human speech communicatio...
متن کاملExpressive speech synthesis in MARY TTS using audiobook data and emotionML
This paper describes a framework for synthesis of expressive speech based on MARY TTS and Emotion Markup Language (EmotionML). We describe the creation of expressive unit selection and HMM-based voices using audiobook data labelled according to voice styles. Audiobook data is labelled/split according to voice styles by principal component analysis (PCA) of acoustic features extracted from segme...
متن کاملA Corpus-based Approach to <ahem/> Expressive Speech Synthesis
Human speech communication can be thought of as comprising two channels – the words themselves, and the style in which they are spoken. Each of these channels carries information. Today's most-advanced text-to-speech (TTS) systems such as [1],[2],[3],[4] fall far short of human speech because they offer only a single, fixed style of delivery, independent of the message. In this paper, we descri...
متن کاملPrediction of Emotions from Text using Sentiment Analysis for Expressive Speech Synthesis
The generation of expressive speech is a great challenge for text-to-speech synthesis in audiobooks. One of the most important factors is the variation in speech emotion or voice style. In this work, we developed a method to predict the emotion from a sentence so that we can convey it through the synthetic voice. It consists of combining a standard emotion-lexicon based technique with the polar...
متن کاملMARY TTS HMM - based voices for the Blizzard Challenge 2012
This paper describes the first participation of MARY TTS HMM-based voices in a Blizzard challenge. An architecture for synthesis of expressive speech based on the MARY TTS system and sentiment analysis of text is proposed. The creation of several HMM-based voices in different styles using audiobook data is described. Preliminary results on perception of different voice styles and the appropriat...
متن کامل