Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis
نویسندگان
چکیده
This paper proposes a novel approach for describing the expressive elements in text genres and modeling their acoustic correlates for expressive text-to-speech synthesis (TTS). We apply the three-dimensional PAD (pleasure-displeasure, arousal-nonarousal and dominance-submissiveness) model in describing expressivity. In particular, we define a set of principles for annotating the P and A values of prosodic words found in texts from the tourist information domain. These text passages may be categorized into the descriptive genre (e.g. describing a beautiful scenic spot), the informative genre (e.g. presenting the opening hours of a museum) and the procedural genre (e.g. offering bus routes to a landmark). We choose the prosodic word as the basic unit for analysis since it bridges textual input with (synthetic) speech output. Analysis of contrastive (neutral versus expressive) recordings uncovers the acoustic correlates of annotated P and A values. This enables us to develop a non-linear model that can transform neutral speech to resemble expressive speech, according to the P and A values of the input text. Perceptual evaluation of the speech outputs shows that over 70% of the prosodic words carry appropriate expressivity.
منابع مشابه
Modeling the Acoustic Correlates of Dialog Act for Expressive Chinese Tts Synthesis
This paper proposed a novel approach for describing the expressivity of dialog text and modelling their acoustic correlates for expressive text-to-speech (TTS) synthesis. We applied the Dialog Acts (DAs) in describing expressivity. In particular, we set up a Wizard-of-Oz (WoZ) data collection framework to collect the tourism domain corpus and annotated the DAs. A Pitch Target model which is opt...
متن کاملParalinguistic elements in speech synthesis
Corpus based text-to-speech systems currently produce very natural synthetic sentences, though limited to a neutral inexpressive speaking style. Paralinguistic elements are some of the expressive features one would most like to introduce. In this paper, we describe a new method for introducing laughter and hesitation in synthetic speech. Thanks to a small dedicated acoustic database, this metho...
متن کاملContinuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM
This paper introduces a continuous system capable of automatically producing the most adequate speaking style to synthesize a desired target text. This is done thanks to a joint modeling of the acoustic and lexical parameters of the speaker models by adapting the CVSM projection of the training texts using MR-HMM techniques. As such, we consider that as long as sufficient variety in the trainin...
متن کاملAcoustic correlates for perceived effort levels in expressive speech
Actors and other vocal performers vary their speech across the continuum of vocal effort to express ideas, emphasize thoughts, communicate emotions, and create drama. They are experts at vocal expression. To analyze this range of expression across effort levels, we curated a corpus of professional actors’ Hamlet soliloquy performances and present an acoustic feature set and classification model...
متن کاملبرجسته سازی در خطبۀ فدکیه حضرت زهرا(ع)
Foregrounding is one of the contemporary literary theories, which from a literary perspective to texts, in prose or verse, endeavors to explain and analyze those effective features and elements in the body of the discourse which rhetorically distinguish literary texts from ordinary ones. According to the Formalists, foregrounding is achieved through diminishing or increasing the rules. In other...
متن کامل