Uniform concatenative excitation model for synthesising speech without voiced/unvoiced classification
نویسنده
چکیده
In general, speech synthesis using the source-filter model of speech production requires the classification of speech into two classes (voiced and unvoiced) which is prone to errors. For voiced speech, the input of the synthesis filter is an approximately periodic excitation, whereas it is a noise signal for unvoiced. This paper proposes an excitation model which can be used to synthesise both voiced and unvoiced speech, thus overcoming the problem of degradation in speech quality caused by those classification errors. Basically this model consists of representing two contiguous segments of the residual signal pitchsynchronously. The first segment is represented by the original residual in a fraction of the period around the pitch-mark (obtained using an epoch detector), in order to capture the most important aspects of the residual during voiced speech. Instead, the remaining part of the period is modelled by a set of parameters of the amplitude envelope of the residual waveform and its energy. The technique for synthesising the excitation combines these shaping parameters with a novel method for regeneration of the residual waveform and a method to mix a periodic signal with noise based on the Harmonic plus Noise model. Besides producing high-quality speech, this technique is computationally fast.
منابع مشابه
Low Resource TTS Synthesis Based on Cepstral Filter with Phase Randomized Excitation
In this paper we present the acoustic synthesis of a low resource Text-To-Speech (TTS) system based on a 7th order cepstral filter. The excitation signal is designed in frequency domain by a two parameter model. This model is able to generate the excitation signal for both, voiced and unvoiced segments. The sets of filter coefficients represent the speech units and are stored in a compressed fo...
متن کاملA Variable Rate Speech Codec Using Vus Classification
Voiced speech is highly correlated and must be reconstructed accurately in order to sound correct. Unvoiced speech on the other hand is noise like in nature. It can be approximated by white noise coloured by the vocal tract filter. Because of this lack of structure in unvoiced speech sounds, the excitation signal does not have to reproduce the speech signal as accurately as for voiced sounds. T...
متن کاملLeast relative entropy for voiced/unvoiced speech classification
The aim of this work is to develop ajlexible and eficient approach to the classifcation of the ratio of voiced to unvoiced excitation sources in continuous speech. To achieve this aim we adopt a probabilistic neural network approach. This is accomplished by designing a multi layer perceptron classifer trained by steepest descent minimization of the Least Relative Entropy W) cost function. By us...
متن کاملSegregation of unvoiced speech from nonspeech interference.
Monaural speech segregation has proven to be extremely challenging. While efforts in computational auditory scene analysis have led to considerable progress in voiced speech segregation, little attention has been given to unvoiced speech, which lacks harmonic structure and has weaker energy, hence more susceptible to interference. This study proposes a new approach to the problem of segregating...
متن کاملImproved training of excitation for HMM-based parametric speech synthesis
This paper presents an improved method of training for the unvoiced filter that comprises an excitation model, within the framework of parametric speech synthesis based on hidden Markov models. The conventional approach calculates the unvoiced filter response from the differential signal of the residual and voiced excitation estimate. The differential signal, however, includes the error generat...
متن کامل