Generative factor analyzed HMM for automatic speech recognition
نویسندگان
چکیده
We present a generative factor analyzed hidden Markov model (GFA-HMM) for automatic speech recognition. In a standard HMM, observation vectors are represented by mixture of Gaussians (MoG) that are dependent on discretevalued hidden state sequence. The GFA-HMM introduces a hierarchy of continuous-valued latent representation of observation vectors, where latent vectors in one level are acoustic-unit dependent and latent vectors in a higher level are acoustic-unit independent. An expectation maximization (EM) algorithm is derived for maximum likelihood estimation of the model. We show through a set of experiments to verify the potential of the GFA-HMM as an alternative acoustic modeling technique. In one experiment, by varying the latent dimension and the number of mixture components in the latent spaces, the GFA-HMM attained more compact representation than the standard HMM. In other experiments with varies noise types and speaking styles, the GFA-HMM was able to have (statistically significant) improvement with respect to the standard HMM. 2005 Elsevier B.V. All rights reserved.
منابع مشابه
Speech recognition with a generative factor analyzed hidden Markov model
We present a generative factor analyzed hidden Markov model (GFA-HMM) for automatic speech recognition. In a traditional HMM, the observation vectors are represented by mixture of Gaussians (MoG) that are dependent on discrete-valued hidden state sequence. The GFA-HMM introduces a hierarchy of continuousvalued latent representation of observation vectors, where latent vectors in one level are a...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملتخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت
The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملLarge Margin Algorithms for Discriminative Continuous Speech Recognition
Automatic speech recognition has long been a considered dream. While ASR does work today, and it is commercially available, it is extremely sensitive to noise, talker variations, and environments. The current state-of-the-art automatic speech recognizers are based on generative models that capture some temporal dependencies such as hidden Markov models (HMMs). While HMMs have been immensely imp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Speech Communication
دوره 45 شماره
صفحات -
تاریخ انتشار 2005