Expressive speech synthesis in MARY TTS using audiobook data and emotionML
نویسندگان
چکیده
This paper describes a framework for synthesis of expressive speech based on MARY TTS and Emotion Markup Language (EmotionML). We describe the creation of expressive unit selection and HMM-based voices using audiobook data labelled according to voice styles. Audiobook data is labelled/split according to voice styles by principal component analysis (PCA) of acoustic features extracted from segmented sentences. We introduce the implementation of EmotionML in MARY TTS and explain how it is used to represent and control expressivity in terms of discrete emotions or emotion dimensions. Preliminary results on perception of different voice styles are presented.
منابع مشابه
MARY TTS HMM - based voices for the Blizzard Challenge 2012
This paper describes the first participation of MARY TTS HMM-based voices in a Blizzard challenge. An architecture for synthesis of expressive speech based on the MARY TTS system and sentiment analysis of text is proposed. The creation of several HMM-based voices in different styles using audiobook data is described. Preliminary results on perception of different voice styles and the appropriat...
متن کاملMARY TTS unit selection and HMM-based voices
This paper describes the implementation of a unit selection English voice and a HMM-based Hindi voice for our participation in the Blizzard Challenge 2013. The two voices have been created using the MARY TTS voice building framework. We describe how audiobook data is used to create the English voice and how a quality controlmeasure (statisticalmodel cost) is used to control the selection of uni...
متن کاملThe NITech text-to-speech system for the Blizzard Challenge 2017
This paper describes a text-to-speech (TTS) system developed at the Nagoya Institute of Technology (NITech) for the Blizzard Challenge 2017. In the challenge, about seven hours of highly expressive speech data from English children’s audiobooks were provided as training data. For this challenge, we redesigned linguistic features for statistical parametric speech synthesis based on audiobooks. F...
متن کاملExploring Rich Expressive Information from Audiobook Data Using Cluster Adaptive Training
Audiobook data is a freely available source of rich expressive speech data. To accurately generate speech of this form, expressiveness must be incorporated into the synthesis system. This paper investigates two parts of this process: the representation of expressive information in a statistical parametric speech synthesis system; and whether discrete expressive state labels can sufficiently rep...
متن کاملUsing Audio Books for Training a Text-to-Speech System
Creating new voices for a TTS system often requires a costly procedure of designing and recording an audio corpus, a time consuming and effort intensive task. Using publicly available audiobooks as the raw material of a spoken corpus for such systems creates new perspectives regarding the possibility of creating new synthetic voices quickly and with limited effort. This paper addresses the issu...
متن کامل