Adjacency Analysis for Designing Unit Selection Speech Model on Micro-Prosodic Level
نویسندگان
چکیده
One of the speech synthesizer problems is the unnaturalness and unintelligible production of speech. It is believed that wave modification could also contribute to the distortion of synthesized speech. The objective of a unit selection speech corpus is to provide a few possible instance of unit in order to produce synthesized speech as close to human speech production without the need (or slight need) to perform wave modification. Different from a standard prerecorded speech database, unit selection allow multiple instances for same unit to be presented in a corpus. These instances are differentiated by a unique combination of acoustic and linguistic value. These acoustic and linguistic values form a set of speech corpus parameter. To minimize distortion in concatenative speech synthesizer, parameters of selected segments need to reflect the quality of natural speech. The selection process depends on the priority level of each parameter in the corpus. These parameters design unit selection model. One of the proposed parameter is adjacent phoneme of the target unit. This paper will describe the analysis on micro-prosody behaviour particularly in pitch. We will also see how adjacency influenced the changes in micro-prosodic features and how the physics of speech contribute to the changes in micro-prosodic aspect.
منابع مشابه
Decision tree micro-prosody structures for text to speech synthesis
This paper explores the use of micro-prosody in improving the quality of synthesised speech in concatenated text to speech synthesis (TTS) systems. Micro-prosody are defined as prosodic signals within context-dependent triphone units and across neighbouring triphones. Micro-prosody parameters are modelled using a Markovian model whose state distributions depend on the current linguistic-prosodi...
متن کاملUnit Selection Speech Synthesis Using Phonetic-Prosodic Description of Speech Databases
This paper describes an approach to speech synthesis based on using speech databases at different stages of TTS process. Speech database units are phones in different segmental and prosodic contexts. Pitch synchronous segmentation and labeling of databases allows storing both segmental and prosodic information. Phonetic-prosodic annotations of speech databases are involved in off-line training ...
متن کاملAnnotation of Chinese Prosodic Level Based on Probabilistic Model
In this paper, a probability based method is proposed for the annotation of Chinese prosodic levels. We investigated the acoustic correlates (F0 pitches, durations and pauses) of the prosodic boundaries and tried to annotate the levels of the boundaries using Gaussian model and Bayesian decision. The method allows efficient and automatic labeling for the large scale speech corpus and can be use...
متن کاملJoint prosodic and segmental unit selection for expressive speech synthesis
One problem in concatenative speech synthesis is how to incorporate prosodic factors in the unit selection. Imposing a predicted prosodic contour as target specification is errorprone and does not benefit from the natural variability contained in the database. This paper introduces a method that searches for the optimal unit sequence by maximizing a joint likelihood at both segmental and prosod...
متن کاملAnalysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech
We have applied two state-of-the-art speech synthesis techniques (unit selection and HMM-based synthesis) to the synthesis of emotional speech. A series of carefully designed perceptual tests to evaluate speech quality, emotion identification rates and emotional strength were used for the six emotions which we recorded – happiness, sadness, anger, surprise, fear, disgust. For the HMM-based meth...
متن کامل