Generalized variable parameter HMMs based acoustic-to-articulatory inversion
نویسندگان
چکیده
Acoustic-to-articulatory inversion is useful for a range of related research areas including language learning, speech production, speech coding, speech recognition and speech synthesis. HMM-based generative modelling methods and DNNbased approaches have become dominant approaches in recent years. In this paper, a novel acoustic-to-articulatory inversion technique based on generalized variable parameter HMMs (GVP-HMMs) is proposed. It leverages the strengths of both generative and neural network based modelling frameworks. On a Mandarin speech inversion task, a tandem GVP-HMM system using DNN bottleneck features as auxiliary inputs significantly outperformed the baseline HMM, multiple regression HMM (MR-HMM), DNN and deep mixture density network (MDN) systems by 0.20mm, 0.16mm, 0.12mm and 0.10mm respectively in terms of electromagnetic articulography (EMA) root mean square error (RMSE).
منابع مشابه
Deep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence Information
In recent years, neural network based acoustic-to-articulatory inversion approaches have achieved the state-of-the-art performance. One major issue associated with these approaches is the lack of phone sequence information during inversion. In order to address this issue, this paper proposes an improved architecture hierarchically concatenating phone classification and articulatory inversion co...
متن کاملAcoustic-to-articulatory inversion using speech recognition and trajectory formation based on phoneme hidden Markov models
In order to recover the movements of usually hidden articulators such as tongue or velum, we have developed a data-based speech inversion method. HMMs are trained, in a multistream framework, from two synchronous streams: articulatory movements measured by EMA, and MFCC + energy from the speech signal. A speech recognition procedure based on the acoustic part of the HMMs delivers the chain of p...
متن کاملToward a Multi-Speaker Visual Articulatory Feedback System
In this paper, we present recent developments on the HMMbased acoustic-to-articulatory inversion approch that we develop for a “visual articulatory feedback” system. In this approach, multi-stream phoneme HMMs are trained jointly on synchronous streams of acoustic and articulatory data, acquired by electromagnetic articulography (EMA). Acousticto-articulatory inversion is achieved in two steps....
متن کاملAcoustic-to-articulatory inversion in speech based on statistical models
Two speech inversion methods are implemented and compared. In the first, multistream Hidden Markov Models (HMMs) of phonemes are jointly trained from synchronous streams of articulatory data acquired by EMA and speech spectral parameters; an acoustic recognition system uses the acoustic part of the HMMs to deliver a phoneme chain and the states durations; this information is then used by a traj...
متن کاملAn Analysis of HMM-based prediction of articulatory movements
This paper presents an investigation into predicting the movement of a speaker’s mouth from text input using hidden Markov models (HMM). A corpus of human articulatory movements, recorded by electromagnetic articulography (EMA), is used to train HMMs. To predict articulatory movements for input text, a suitable model sequence is selected and a maximum-likelihood parameter generation (MLPG) algo...
متن کامل