Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis

نویسندگان

Hongwu Yang

Helen M. Meng

Lianhong Cai

چکیده

This paper proposes a novel approach for describing the expressive elements in text genres and modeling their acoustic correlates for expressive text-to-speech synthesis (TTS). We apply the three-dimensional PAD (pleasure-displeasure, arousal-nonarousal and dominance-submissiveness) model in describing expressivity. In particular, we define a set of principles for annotating the P and A values of prosodic words found in texts from the tourist information domain. These text passages may be categorized into the descriptive genre (e.g. describing a beautiful scenic spot), the informative genre (e.g. presenting the opening hours of a museum) and the procedural genre (e.g. offering bus routes to a landmark). We choose the prosodic word as the basic unit for analysis since it bridges textual input with (synthetic) speech output. Analysis of contrastive (neutral versus expressive) recordings uncovers the acoustic correlates of annotated P and A values. This enables us to develop a non-linear model that can transform neutral speech to resemble expressive speech, according to the P and A values of the input text. Perceptual evaluation of the speech outputs shows that over 70% of the prosodic words carry appropriate expressivity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling the Acoustic Correlates of Dialog Act for Expressive Chinese Tts Synthesis

This paper proposed a novel approach for describing the expressivity of dialog text and modelling their acoustic correlates for expressive text-to-speech (TTS) synthesis. We applied the Dialog Acts (DAs) in describing expressivity. In particular, we set up a Wizard-of-Oz (WoZ) data collection framework to collect the tourism domain corpus and annotated the DAs. A Pitch Target model which is opt...

متن کامل

Paralinguistic elements in speech synthesis

Corpus based text-to-speech systems currently produce very natural synthetic sentences, though limited to a neutral inexpressive speaking style. Paralinguistic elements are some of the expressive features one would most like to introduce. In this paper, we describe a new method for introducing laughter and hesitation in synthetic speech. Thanks to a small dedicated acoustic database, this metho...

متن کامل

Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM

This paper introduces a continuous system capable of automatically producing the most adequate speaking style to synthesize a desired target text. This is done thanks to a joint modeling of the acoustic and lexical parameters of the speaker models by adapting the CVSM projection of the training texts using MR-HMM techniques. As such, we consider that as long as sufficient variety in the trainin...

متن کامل

Acoustic correlates for perceived effort levels in expressive speech

Actors and other vocal performers vary their speech across the continuum of vocal effort to express ideas, emphasize thoughts, communicate emotions, and create drama. They are experts at vocal expression. To analyze this range of expression across effort levels, we curated a corpus of professional actors’ Hamlet soliloquy performances and present an acoustic feature set and classification model...

متن کامل

برجسته سازی در خطبۀ فدکیه حضرت زهرا(ع)

Foregrounding is one of the contemporary literary theories, which from a literary perspective to texts, in prose or verse, endeavors to explain and analyze those effective features and elements in the body of the discourse which rhetorically distinguish literary texts from ordinary ones. According to the Formalists, foregrounding is achieved through diminishing or increasing the rules. In other...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis

نویسندگان

چکیده

منابع مشابه

Modeling the Acoustic Correlates of Dialog Act for Expressive Chinese Tts Synthesis

Paralinguistic elements in speech synthesis

Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM

Acoustic correlates for perceived effort levels in expressive speech

برجسته سازی در خطبۀ فدکیه حضرت زهرا(ع)

عنوان ژورنال:

اشتراک گذاری