A Corpus-based Approach to <ahem/> Expressive Speech Synthesis
نویسندگان
چکیده
Human speech communication can be thought of as comprising two channels – the words themselves, and the style in which they are spoken. Each of these channels carries information. Today's most-advanced text-to-speech (TTS) systems such as [1],[2],[3],[4] fall far short of human speech because they offer only a single, fixed style of delivery, independent of the message. In this paper, we describe the IBM Expressive TTS Engine, which is able to add another channel by offering five speaking styles. These are: neutral declarative, conveying good news, conveying bad news, asking a question, and showing contrastive emphasis. In addition to generating speech in these five styles, our TTS system is also able to generate paralinguistic events such as sighs, breaths, and filled pauses which further enrich the style channel. We describe our methods for generating and evaluating expressive synthetic speech and paralinguistic effects. We show significant perceptual differences between expressive and neutral synthetic speech for each of our speaking styles. In addition, we describe how users have been empowered to easily communicate the desired expression to the TTS engine through our extensions [5] of the Speech Synthesis Markup Language (SSML) [6].
منابع مشابه
Expressive Speech Synthesis for Czech Limited Domain Dialogue System – Basic Experiments
This paper describes a development of limited domain expressive speech synthesis for the Czech language. Our current speech synthesis system is based on unit selection methods and produces high quality speech in a neutral speaking style. This work focuses on modifications made in the synthesis algorithm to integrate expressivity into generated speech. There is also introduced a listening test, ...
متن کاملListening-Test-Based Annotation of Communicative Functions for Expressive Speech Synthesis
This paper is focused on the evaluation of listening test that was realized with a view to objectively annotate expressive speech recordings and further develop a limited domain expressive speech synthesis system. There are two main issues to face in this task. The first matter in issue to be taken into consideration is the fact that expressivity in speech has to be defined in some way. The sec...
متن کاملTowards synthesising expressive speech; designing and collecting expressive speech data
Corpus-based speech synthesis needs representative corpora of human speech if it is to meet the needs of everyday spoken interaction. This paper describes methods for recording such corpora, and details some difficulties (with their solutions) found in the use of spontaneous speech data for synthesis.
متن کاملAutomatic exploration of corpus-specific properties for expressive text-to-speech: a case study in emphasis
In this paper we explore an approach to expressive text-tospeech synthesis in which pre-existing expression-specific corpora are complemented with automatically generated labels to augment the search space of units the engine can exploit to increase its expressiveness. We motivate this data-discovery approach as an alternative to an approach guided by data collection, in order to harness the fu...
متن کاملFormal expressive indiscernibility underlying a prosodic deformation model
We are here concerned by the setting up of a model and a formalism for expressive speech synthesis under the paradigm of a corpus-based approach. Our objective is to apply prosodic expressive forms, acquired from natural human-reading recordings, on a new textual matter. We outline a general model for speech expressiveness. Then we deal with some formal aspects of expressive representation. We ...
متن کامل