Joint prosodic and segmental unit selection for expressive speech synthesis
نویسندگان
چکیده
One problem in concatenative speech synthesis is how to incorporate prosodic factors in the unit selection. Imposing a predicted prosodic contour as target specification is errorprone and does not benefit from the natural variability contained in the database. This paper introduces a method that searches for the optimal unit sequence by maximizing a joint likelihood at both segmental and prosodic level. At the segmental level, the concatenation cost and target cost are reformulated in terms of conditional and a priori probabilities which are combined with probabilistic models of fundamental frequency and duration at the syllable level and the phrase level. A generalized version of the Viterbi algorithm is used to take into account the long-term dependencies introduced by the prosodic models during the search of the optimal unit sequence. This method has been implemented in a unit selection synthesizer using an expressive speech database and a subjective evaluation shows an improvement in the prosodic quality, although the overall quality is only slightly enhanced.
منابع مشابه
Towards intonation control in unit selection speech synthesis
We propose to control intonation in unit selection speech synthesis with a mixed CART-HMM intonation model. The Finite State Machine (FSM) formulation is suited to incorporate the intonation model in the unit selection framework because it allows for combination of models with different unit types and handling competing intonative variants. Subjective experiments have been carried out to compar...
متن کاملUnit Selection Speech Synthesis Using Phonetic-Prosodic Description of Speech Databases
This paper describes an approach to speech synthesis based on using speech databases at different stages of TTS process. Speech database units are phones in different segmental and prosodic contexts. Pitch synchronous segmentation and labeling of databases allows storing both segmental and prosodic information. Phonetic-prosodic annotations of speech databases are involved in off-line training ...
متن کاملJoint prosodic and segmental unit selection speech synthesis
We describe a unit selection technique for text-to-speech synthesis which jointly searches the space of possible diphone sequences and the space of possible prosodic unit sequences in order to produce synthetic speech with more natural prosody. We demonstrates that this search, although currently computationally expensive, can achieve improved intonation compared to a baseline in which only the...
متن کاملF0 contour and segmental duration modeling using prosodic features
This paper proposes a framework of F0 contour generation and segmental duration modeling for application in a unit-selection speech synthesis system for Polish – BOSS. We describe the design of the F0 and duration modeling modules and emphasize the role of prosodic features (related to stress, pitch accent and phrase) in these two tasks.
متن کاملIncluding pitch accent optionality in unit selection text-to-speech synthesis
A significant variability in pitch accent placement is found when comparing the patterns of prosodic prominence realized by different English speakers reading the same sentences. In this paper we describe a simple approach to incorporate this variability to synthesize prosodic prominence in unit selection text-to-speech synthesis. The main motivation of our approach is that by taking into accou...
متن کامل