Rule-based Emotion Synthesis Using Concatenated Speech
نویسندگان
چکیده
Concatenative speech synthesis is increasing in popularity, as it offers higher quality output than previous formant synthesisers. However, it is based on recorded speech units, concatenative synthesis offers a lesser degree of parametric control during resynthesis. Consequently, adding pragmatic effects such as different speaking styles and emotions at the synthesis stage is fundamentally more difficult than with formant synthesis. This paper describes the results of a preliminary attempt to add emotion to concatenative synthetic speech (using BT's Laureate synthesiser), initially using techniques already applied successfully to formant synthesis. A new intonation contour (including both pitch and duration changes) was applied to the concatenated segments during production of the final audible utterance, and some of the available synthesis parameters were systematically modified to increase the affective content. The output digital speech samples were then subject to further manipulation with a waveform editing package, to produce the final output utterance. The results of this process were a small number of manually-produced utterances, but which illustrated that affective manipulations were possible on this type of synthesiser. Further work has produced rule-based implementations which allow automatic production of emotional utterances. Development of these systems will be described, and some initial results from listener studies will be presented.
منابع مشابه
Speech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملText-to-Speech Synthesis using Phoneme Concatenation
We proposed Text-To-Speech (TTS) synthesis system based on phonetic concatenation for unrestricted input text. The input text is first converted into phonetic transcription using Letter-to-Sound rules. For synthesis of a new speech, TTS system selects the recorded phoneme units (PUs) from database and modifies the duration according to the rule based on spelling using Time Domain Pitch Synchron...
متن کاملDevelopment of Concatenative Syllable based Text to Speech Synthesis System for Tamil
This paper addresses the problem of improving the intelligibility of the synthesized speech in Tamil TTS synthesis system. The human speech is artificially generated by Speech synthesis. The normal language text will be automatically converted into speech using Text-to-speech (TTS) system. This paper deals with a corpus-driven Tamil TTS system based on the concatenative synthesis approach. Conc...
متن کاملPunjabi Speech Synthesis System Using Htk
This paper describes an Hidden Markov Model-based Punjabi text-to-speech synthesis system (HTS), in which speech waveform is generated from Hidden Markov Models themselves, and applies it to Punjabi speech synthesis using the general speech synthesis architecture of HTK (HMM Tool Kit). This Hidden Markov Model based TTS can be used in mobile phones for stored phone directory or messages. Text m...
متن کاملIntegration of Rule-based Formant Synthesis and Waveform Concatenation: a Hybrid Approach to Text-to-speech Synthesis
This paper describes an approach to speech synthesis in which waveform fragments dynamically produced with a set of formant-based synthesis rules are concatenated with pre-stored natural speech waveform fragments to produce a synthetic utterance. While this hybrid approach was originally implemented as a tool for research into improved voice quality in formant-based synthesis, it has produced s...
متن کامل