Lip-reading from parametric lip contours for audio- visual speech recognition
نویسندگان
چکیده
This paper describes the incorporation of a visual lip tracking and lip-reading algorithm that utilizes the affine-invariant Fourier descriptors from parametric lip contours to improve the audio-visual speech recognition systems. The audio-visual speech recognition system presented here uses parallel hidden Markov models (HMMs), where a joint decision, using an optimal decision rule, is made after processing. This work describes the extraction of affine-invariant Fourier descriptors (AI-FDs) from parametric lip contour data. Finally, this work validates the use of optimal weight selection, which is based on the noise type and signal-to-noise ratio (SNR) for joint audio-visual automatic speech recognition (JAV-ASR).
منابع مشابه
Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کاملVisual Speech Recognition: A Solution from Feature Extraction to Words Classification
Audio-visual Speech Recognition has been an active area of research lately. A bit, and yet unsolved, part of this problem is the visual only recognition, or lip reading. Considering an image sequence of a person pronouncing a word, a full image analysis solution would have to segment the mouth area, extract relevant features, and use them to be able to classify the word from those visual featur...
متن کاملA Survey – Audio and Video Synchronization
The audio and video Synchronization is extremely necessary. The synchronization loss between image and sound continues to disturb observers and irritate telecasters. The demand is to assure synchronization without adjusting content at the same time as still retaining price low. The objective of the synchronization is to line up both the audio and video signals that are processed individually. T...
متن کاملImproving Lip-reading with Feature Spac Audio-Visual Speech R
In this paper we investigate feature space transforms to improve lip-reading performance for multi-stream HMM based audio-visual speech recognition (AVSR). The feature space transforms include non-linear Gaussianization transform and feature space maximum likelihood linear regression (fMLLR). We apply Gaussianization at the various stages of visual front-end. The results show that Gaussianizing...
متن کامل3d Lip-tracking for Audio-visual Speech Recognition in Real Applications
In this paper, we present a solution to the problem of tracking 3D information about the shape of lips from 2D picture of a speaker. We focus on lip-tracking of audio-visual speech recordings from the Czech in-vehicle audio-visual speech corpus (CIVAVC). The corpus consists of 4 h 40 min records of audiovisual speech of driver recorded in a car during driving in an usual traffic. In real condit...
متن کامل