Emphasis recreation for TTS using intonation atoms
نویسندگان
چکیده
We are interested in emphasis for text to speech synthesis. In speech to speech translation, emphasising the correct words is important to convey the underlying meaning of a message. In this paper, we propose to use a generalised command-response (CR) model of intonation to generate emphasis in synthetic speech. We first analyse the differences in the model parameters between emphasised words in an acted emphasis scenario and their neutral counterpart. We investigate word level intonation modelling using simple random forest as a basis framework, to predict the parameters of the model in the specific case of emphasised word. Based on the linguistic context of the words we want to emphasise, we attempt at recovering emphasis pattern in the intonation in originally neutral synthetic speech by generating word-level model parameters with similar context. The method is presented and initial results are given, on synthetic speech.
منابع مشابه
Modeling of intonation bearing emphasis for TTS-synthesis of greek dialogues
TTS-synthesis of neutral style Greek with good intelligibility and quality has been achieved some time ago. As a further step towards expanding the applications domain of the TTS-system developed in our laboratory, the incorporation of emphasis into speech used in man-machine dialogues according to their context has been studied recently. In this paper the method applied for the analysis of int...
متن کاملHigh quality speech synthesis using a small speech dataset
We propose an approach to synthesizing high-quality speech under the conditions of a small dataset. A robust method for solving this problem is vital for voice restoration (recreation of lost fragments of records based on available speech material of a well-known person, e.g. an actor). The proposed TTS system is a hybrid system which includes the advantages of both HMMand Unit Selection-based ...
متن کاملIntonation Atom Based Emphasis Transfer
Speech to speech translation can benefit from translation of emphasis. We propose to use an intonation model to retrieve and transfer events associated with emphasis in the intonation. This model decomposes the F0 contour into basic intonation atoms using the matching pursuit algorithm. We investigate the role of these components in the perception of emphasis. Some of the most prominent local c...
متن کاملComparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis
Chironomic stylization is the process of real-time modification of intonation contours (f0 and tempo) using drawing/writing gestures with a stylus on a graphic tablet. The question addressed in this research is whether hand-made intonation stylization could improve or degrade expressivity and overall quality, compared to statistical modeling of prosody. A system for expressive TTS in French bas...
متن کاملMaximum-likelihood dynamic intonation model for concatenative text-to-speech system
In this work we present a Maximum Likelihood (ML) joint pitch curve modeling, inspired by HMM TTS synthesis concept. This model provides an optimal solution for the coarse target intonation curve (3 points per syllable) and incorporates both static and dynamic pitch values for better utterance intonation modeling. The coarse intonation curve may be optionally combined with the original pitch ex...
متن کامل