Articulatory synthesis from x-rays and inversion for an adaptive speech robot
نویسندگان
چکیده
This paper describes a speech robotic approach to articulatory synthesis. An anthropomorphic speech robot has been built, based on a real reference subject’s data. This speech robot, called the Articulotron, has a set of relevant degrees of freedom for speech articulators, jaw, tongue, lips, and larynx. The associated articulatory model has been elaborated from cineradiographic midsagittal profiles recorded in synchrony with front lips views; the model of noise source for fricative excitation has been derived from acoustic and aerodynamic measurements on the same reference subject. In a first phase, the Articulotron has been used to perform the copy synthesis of the vowels, fricative and plosive consonants in the X-ray corpus. This allows to assess the performance of the Articulotron in producing fairly high quality speech, and provides a reference against which other attempts of articulatory synthesis can be compared. In a second phase, the Articulotron has be used to recover articulatory gestures from audio-visual speech prototypes. At the present stage, a gradient descent algorithm is used to learn the articulatory trajectories of the robot by optimisation, starting from the formant trajectories and the knowledge of constraints for the consonantal constriction or closure, in order to mimic the original VCV audio-visual sequences. The adaptive skill of the robot is demonstrated through articulator perturbation experiments and through the elaboration of relevant strategies in the hyper/hypo speech paradigm. A video tape will demonstrate an animation of the Articulotron, displaying the jaw, the tongue and the lips, for various examples of adaptive articulatory synthesis.
منابع مشابه
Acoustic to articulatory inversion
The context of this work is speech analysis. The subject deals with acoustic-to-articulatory inversion, i.e. the recovery of the temporal evolution of the vocal tract shape from the signal. This topic is important because it is likely to give rise to applications in the domains of speech coding as well as second language learning. Acoustic-to-articulatory inversion relies on an analysis by synt...
متن کاملGeneralized variable parameter HMMs based acoustic-to-articulatory inversion
Acoustic-to-articulatory inversion is useful for a range of related research areas including language learning, speech production, speech coding, speech recognition and speech synthesis. HMM-based generative modelling methods and DNNbased approaches have become dominant approaches in recent years. In this paper, a novel acoustic-to-articulatory inversion technique based on generalized variable ...
متن کاملA study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model.
In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost functi...
متن کاملAcoustic-to-articulatory inversion by analysis-by-synthesis using cepstral coefficients
This paper deals with acoustic to articulatory inversion of speech by using an analysis by synthesis approach. We used old X-ray films of one speaker to (i) the develop a linear articulatory model presenting a small geometric mismatch with the subject’s vocal tract mid sagittal images (ii) and design an adaptation procedure of cepstral vectors used as input data. The adaptation exploits the bil...
متن کاملAcoustic-to-articulatory Inversion Using Dynamical and Phonological Constraints
A well-known difficulty in using the articulatory representation for applications in the areas of speech coding, synthesis and recognition is the poor accuracy in the estimation of the articulatory parameters from the acoustic signal of speech. The difficulty is especially serious for most classes of consonantal sounds. This paper presents a statistical method of estimating the articulatory tra...
متن کامل