Limitations of visual speech recognition

نویسندگان

Jacob L. Newman

Barry-John Theobald

Stephen J. Cox

چکیده

In this paper we investigate the limits of automated lip-reading systems and we consider the improvement that could be gained were additional information from other (non-visible) speech articulators available to the recogniser. Hidden Markov model (HMM) speech recognisers are trained using electromagnetic articulography (EMA) data drawn from the MOCHA-TIMIT data set. Articulatory information is systematically withheld from the recogniser and the performance is tested and compared with that of a typical state of the art lip-reading system. We find that, as expected, the performance of the recogniser degrades as articulatory information is lost, and that a typical lip-reading system achieves a level of performance similar to an EMAbased recogniser that uses information from only the front of the tongue forwards. Our results show that there is significant information in the articulator positions towards the back of the mouth that could be exploited were it available, but even this is insufficient to achieve the same level of performance as can be achieved by an acoustic speech recogniser.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

به کارگیری سامانه تبدیل گفتار به متن در حوزه مراقبت سلامت: مزایا، محدودیت‌ها، راهکارها

Background and Aim: The applicability of any technology to enter a certain field is determined by defining the advantages and disadvantages of the system in that field. The aim of this study is to show the advantages and limitations of using speech recognition systems in health care and providing practical solutions to improve the acceptability of the system in that field. Materials and M...

متن کامل

Correlation between Auditory Spectral Resolution and Speech Perception in Children with Cochlear Implants

Background: Variability in speech performance is a major concern for children with cochlear implants (CIs). Spectral resolution is an important acoustic component in speech perception. Considerable variability and limitations of spectral resolution in children with CIs may lead to individual differences in speech performance. The aim of this study was to assess the correlation between auditory ...

متن کامل

Real-time audio-visual voice activity detection for speech recognition in noisy environments

Voice activity detection (VAD) is one of the most critical issues on performance degradation of speech recognition in noisy environment applications. A real-time VAD was developed by using face parameters (eye and lip contours) as a front-end for the traditional speech and noise (audio) GMMbased method. Speech recognition performance of the audiovisual VAD is shown to be comparable with audio-o...

متن کامل

P65: Speech Recognition Based on Bbrain Signals by the Quantum Support Vector Machine for Inflammatory Patient ALS

People communicate with each other by exchanging verbal and visual expressions. However, paralyzed patients with various neurological diseases such as amyotrophic lateral sclerosis and cerebral ischemia have difficulties in daily communications because they cannot control their body voluntarily. In this context, brain-computer interface (BCI) has been studied as a tool of communication for thes...

متن کامل

A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition

The use of visual information from speaker’s mouth region has shown to improve the performance of Automatic Speech Recognition (ASR) systems. This is particularly useful in presence of noise, which even in moderate form severely degrades the speech recognition performance of systems using only audio information. Various sets of features extracted from speaker’s mouth region have been used to im...

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Limitations of visual speech recognition

نویسندگان

چکیده

منابع مشابه

به کارگیری سامانه تبدیل گفتار به متن در حوزه مراقبت سلامت: مزایا، محدودیت‌ها، راهکارها

Correlation between Auditory Spectral Resolution and Speech Perception in Children with Cochlear Implants

Real-time audio-visual voice activity detection for speech recognition in noisy environments

P65: Speech Recognition Based on Bbrain Signals by the Quantum Support Vector Machine for Inflammatory Patient ALS

A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

عنوان ژورنال:

اشتراک گذاری