Robust audiovisual integration using semicontinuous hidden Markov models

نویسندگان

  • Qin Su
  • Peter L. Silsbee
چکیده

We describe an improved method of integrating audio and visual information in a HMM-based audiovisual ASR system. The method uses a modi ed semicontinuous HMM (SCHMM) for integration and recognition. Our results show substantial improvements over earlier integration methods at high noise levels. Our integration method relies on the assumption that, as environmental conditions deviate from those under which training occurred, the underlying probability distributions will also change. We use phoneme based SCHMMs for classi cation of isolated words. The probability models underlying the standard SCHMM are Gaussian; thus, low probability estimates will tend to be associated with high con dences (small di erences in the feature values cause large proportional differences in probabilities, when the values are in the tail of the distribution). Therefore, during classi cation, we replace each Gaussian with a scoring function which looks Gaussian near the mean of the distribution but has a heavier tail. We report results comparing this method with an audioonly system and with previous integration methods. At high noise levels, the system with modi ed scoring functions shows a better than 50recognition does su er when noise is low. Methods which can adjust the relative weight of the audio and visual information can still potentially outperform the new method, provided that a reliable way of choosing those weights can be found.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IDIAP Martigny - Valais - Suisse Continuous Audio � Visual Speech Recognition

We address the problem of robust lip tracking, visual speech feature extraction, and sensor integration for audiovisual speech recognition applications. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We t a c kle the problem of joint temporal mo...

متن کامل

Continuous Audio-visual Speech Recognition Continuous Audio-visual Speech Recognition

We address the problem of robust lip tracking, visual speech feature extraction, and sensor integration for audiovisual speech recognition applications. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We tackle the problem of joint temporal model...

متن کامل

Improving lip-reading performance for robust audiovisual speech recognition using DNNs

This paper presents preliminary experiments using the Kaldi toolkit [1] to investigate audiovisual speech recognition (AVSR) in noisy environments using deep neural networks (DNNs). In particular we use a single-speaker large vocabulary, continuous audiovisual speech corpus to compare the performance of visual-only, audio-only and audiovisual speech recognition. The models trained using the Kal...

متن کامل

Introducing Busy Customer Portfolio Using Hidden Markov Model

Due to the effective role of Markov models in customer relationship management (CRM), there is a lack of comprehensive literature review which contains all related literatures. In this paper the focus is on academic databases to find all the articles that had been published in 2011 and earlier. One hundred articles were identified and reviewed to find direct relevance for applying Markov models...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996