An Analysis-by-Synthesis Approach to Vocal Tract Modeling for Robust Speech Recognition Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Electrical and Computer Engineering

نویسنده

  • Ziad A. Al Bawab
چکیده

In this thesis we present a novel approach to speech recognition that incorporates knowledge of the speech production process. The major contribution is the development of a speech recognition system that is motivated by the physical generative process of speech, rather than the purely statistical approach that has been the basis for virtually all current recognizers. We follow an analysis-by-synthesis approach. We begin by attributing a physical meaning to the inner states of the recognition system pertaining to the configurations the human vocal tract takes over time. We utilize a geometric model of the vocal tract, adapt it to our speakers, and derive realistic vocal tract shapes from electromagnetic articulograph (EMA) measurements in the MOCHA database. We then synthesize speech from the vocal tract configurations using a physiologically-motivated articulatory synthesis model of speech generation. Finally, the observation probability of the Hidden Markov Model (HMM) used for phone classification is a function of the distortion between the speech synthesized from the vocal tract configurations and the real speech. The output of each state in the HMM is based on a mixture of density functions. Each density models the distribution of the distortion at the output of each vocal tract configuration. During training we initialize the model parameters using ground-truth articulatory knowledge. During testing only the acoustic data are used. In the first part of the thesis we describe a segmented phone classification experiment. We present results using analysis-by-synthesis distortion features derived from a codebook of vocal tract shapes. We create a codebook of vocal tract configurations from the EMA data to constrain the articulatory space. Improvements are achieved by combining the probability scores generated using the distortion features with scores obtained using traditional acoustic features. In the second part of the thesis we discuss our work on deriving realistic vocal tract shapes from the EMA measurements. We present our method of using EMA data from each speaker in MOCHA to adapt Maeda’s geometric model of the vocal tract. For a given utterance, we search the codebook for codewords corresponding to vocal tract contours that provide the best fit to the superimposed EMA data on a frame-by-frame ii basis. The articulatory synthesis approach of Sondhi and Schroeter is then used to synthesize speech from these codewords. We present a technique for synthesizing speech solely from the EMA measurements. Reductions in Mel-cepstral distortion between the real speech and the synthesized speech are achieved using our adaptation procedure. In the third part of the thesis we present a dynamic articulatory model for phone classification. The model integrates real articulatory information derived from EMA data into its inner states. It maps from the articulatory space to the acoustic space using an adapted vocal tract model for each speaker and a physiologically-based articulatory synthesis approach. We apply the analysis-by-synthesis paradigm in a statistical fashion. The distortion between the speech synthesized from the articulatory states and the incoming speech signal is used to compute the output observation probability of the Hidden Markov Model (HMM) used for classification. The output of each state in the HMM is based on a mixture probability density function. Each probability density models the distribution of the distortion at the output of each codeword. The estimation algorithm converges to a solution that zeros out the weights of the unlikely codewords for each phone. Hence, each state inherits an articulatory meaning based on these estimated weights and the transition from one state to another reflects articulatory movements. Experiments with the novel framework show improvements in phone classification accuracy over baseline accuracy obtained using a purely statistically-based system, as well as a close resemblance of the estimated weights to ground-truth articulatory knowledge. To our knowledge this is the first work that applies the analysis-by-synthesis paradigm in a statistical fashion for phone classification. It is the first attempt to integrate realistic speaker-adapted vocal tract shapes with a physiologically-motivated articulatory synthesis model in a dynamic pattern recognition framework. It is also the first work to synthesize continuous speech waveforms solely from EMA measurements and to perform a speaker-independent analysis of highly speaker-dependent EMA phenomena.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

اثربخشی آموزش ابراز وجود فرهنگمحور بر عزت‌نفس فرزندان طلاق

Brever, M.M.( 2010).The effects  of child gender and child age at the time of parental divorce on the development. COLLEGE OF SOCIAL AND BEHAVIORAL SCIENCES, Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Psychology Educational Track.  

متن کامل

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Computer Engineering

............................................................................................................................................ 3 Acknowledgements ............................................................................................................................. 5 Table of

متن کامل

Mathematical Modeling and Experimental Verification of Resonance Energy Transfer

Mathematical Modeling and Experimental Verification of Resonance Energy Transfer Networks: Applications in Cryptography and Biological Sensing by Vishwa Nellore Department of Electrical and Computer Engineering Duke University Date: 11/24/2014 Approved: ___________________________ Christopher Dwyer, Supervisor ___________________________ Alvin Lebeck ___________________________ Jeff Glass _____...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009