Intonation Conversion from Neutral to Expressive Speech
نویسندگان
چکیده
Intonation is one of the most important factors of speech expressivity. This paper presents a conversion method for the F0 contours. The F0 segments are represented with discrete cosine transform (DCT) coefficients at the syllable level. Multi-level dynamic features are added to model the temporal correlation between syllables and to constrain the F0 contour at the phrase level. Gaussian mixture models (GMM) are used to map the prosodic features between neutral and expressive speech, and the converted F0 contour is generated under the dynamic features constraints. Experimental evaluation using a database of acted emotional speech shows the effectiveness of the proposed F0 model and conversion method.
منابع مشابه
Transformation of emotion based on acoustic features of intonation patterns for Hindi speech
Changes in intonation patterns may convey not only different meaning but different emotions even if the sequence of speech segments are same in a sentence. The patterns change depending upon structure and emotion of the sentence and require being stored in speech database. It is a difficult and time-consuming task to store all utterances of all the expressive style, which also consumes huge mem...
متن کاملClustering of foot-based pitch contours in expressive speech
Intonation generation is still one of the weak links in the textto-speech synthesis chain. It is a hard enough task to generate expressively neutral pitch contours, with accurate placement of accents and phrase boundaries, but to generate appropriate intonation for expressive speech is even more of a challenge. This paper is a first attempt at describing and categorizing the variation in pitch ...
متن کاملComparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis
Chironomic stylization is the process of real-time modification of intonation contours (f0 and tempo) using drawing/writing gestures with a stylus on a graphic tablet. The question addressed in this research is whether hand-made intonation stylization could improve or degrade expressivity and overall quality, compared to statistical modeling of prosody. A system for expressive TTS in French bas...
متن کاملData-driven emotion conversion in spoken English
This paper describes an emotion conversion system that combines independent parameter transformation techniques to endow a neutral utterance with a desired target emotion. A set of prosody conversion methods have been developed which utilise a small amount of expressive training data ( 15 min) and which have been evaluated for three target emotions: anger, surprise and sadness. The system perfo...
متن کاملA comparison of voice conversion methods for transforming voice quality in emotional speech synthesis
This paper presents a comparison of methods for transforming voice quality in neutral synthetic speech to match cheerful, aggressive, and depressed expressive styles. Neutral speech is generated using the unit selection system in the MARY TTS platform and a large neutral database in German. The output is modified using voice conversion techniques to match the target expressive styles, the focus...
متن کامل