audio visual sign

Automatic speechreading of impaired speech

2001

Gerasimos Potamianos Chalapathy Neti

We investigate the use of visual, mouth-region information in improving automatic speech recognition (ASR) of the speech impaired. Given the video of an utterance by such a subject, we first extract appearance-based visual features from the mouth region-of-interest, and we use a feature fusion method to combine them with the subject’s audio features into bimodal observations. Subsequently, we a...

متن کامل

Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

Journal: :EURASIP J. Adv. Sig. Proc. 2002

Petar S. Aleksic Jay J. Williams Zhilin Wu Aggelos K. Katsaggelos

We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs) supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorit...

متن کامل

Measuring Audio and Visual Speech Synchrony: Methods and Applications

2006

H. Bredin G. Chollet

Speech is a means of communication that is intrinsically bimodal: the audio signal originates from the dynamics of the articulators. This paper reviews recent works in the field of audiovisual speech and more specifically on techniques developed to measure the level of correspondence between audio and visual speech. It overviews the most common audio and visual speech front-end processing, tran...

متن کامل

Measuring Audio and Visual Speech Synchrony: Methods

2007

H. Bredin G. Chollet

Speech is a means of communication that is intrinsically bimodal: the audio signal originates from the dynamics of the articulators. This paper reviews recent works in the field of audiovisual speech and more specifically on techniques developed to measure the level of correspondence between audio and visual speech. It overviews the most common audio and visual speech front-end processing, tran...

متن کامل

SMART-I: Spatial Multi-user Audio-Visual Real Time Interactive Interface

2011

Marc Rébillat

The SMART-I aims at creating a precise and coherent virtual environment by providing users with both audio and visual accurate localization cues. It is known that for audio rendering, Wave Field Synthesis, and for visual rendering, Tracked Stereoscopy, individually permit high quality spatial immersion within an extended space. The proposed system combines these two rendering approaches through...

متن کامل

Audiovisual Speech Synchrony Measure: Application to Biometrics

Journal: :EURASIP J. Adv. Sig. Proc. 2007

Hervé Bredin Gérard Chollet

Speech is a means of communication which is intrinsically bimodal: the audio signal originates from the dynamics of the articulators. This paper reviews recent works in the field of audiovisual speech, and more specifically techniques developed to measure the level of correspondence between audio and visual speech. It overviews the most common audio and visual speech front-end processing, trans...

متن کامل

Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

Journal: :EURASIP Journal on Advances in Signal Processing 2002

متن کامل

A Visual Signal Reliability for Robust Audio-Visual Speaker Identification

Journal: :IEICE Transactions on Information and Systems 2011

متن کامل

Speaker-dependent audio-visual emotion recognition

2009

Sanaul Haq Philip J. B. Jackson

This paper explores the recognition of expressed emotion from speech and facial gestures for the speaker-dependent case. Experiments were performed on an English audio-visual emotional database consisting of 480 utterances from 4 English male actors in 7 emotions. A total of 106 audio and 240 visual features were extracted and features were selected with Plus l-Take Away r algorithm based on Bh...

متن کامل

Characteristics of the Use of Coupled Hidden Markov Models for Audio-Visual Polish Speech Recognition

2012

Mariusz Kubanek

This paper focuses on combining audio-visual signals for Polish speech recognition in conditions of highly disturbed audio speech signal. Recognition of audio-visual speech was based on combined hidden Markov models (CHMM). Described methods where developed for a single isolated command, nevertheless their effectiveness indicated that they would also work similarly in continuous audio-visual sp...

متن کامل