Video Analysis of Mouth Movement Using Motion Templates for Computer-based Lip-Reading
نویسنده
چکیده
This thesis presents a novel lip-reading approach to classifying utterances from video data, without evaluating voice signals. This work addresses two important issues which are • the efficient representation of mouth movement for visual speech recognition • the temporal segmentation of utterances from video. The first part of the thesis describes a robust movement-based technique used to identify mouth movement patterns while uttering phonemes. This method temporally integrates the video data of each phoneme into a 2-D grayscale image named as a motion template (MT). This is a view-based approach that implicitly encodes the temporal component of an image sequence into a scalar-valued MT. The data size was reduced by extracting image descriptors such as Zernike moments (ZM) and discrete cosine transform (DCT) coefficients from MT. Support vector machine (SVM) and hidden Markov model (HMM) were used to classify the feature descriptors. A video speech corpus of 2800 utterances was collected for evaluating the efficacy of MT for lip-reading. The experimental results demonstrate the promising performance of MT in mouth movement representation. The advantages and limitations of MT for visual speech recognition were identified and validated through experiments. A comparison between ZM and DCT features indicates that the accuracy of classification for both methods is very comparable when there is no relative motion between the camera and the mouth. Nevertheless, ZM is resilient to rotation of the camera and continues to give good results despite rotation but DCT is sensitive to rotation. DCT features are demonstrated to have better tolerance to image noise than ZM. The results also demonstrate a slight improvement of 5% using SVM as compared to HMM. The second part of this thesis describes a video-based, temporal segmentation framework to detect key frames corresponding to the start and stop of utterances from an image sequence, without using the acoustic signals. This segmentation technique integrates mouth movement and appearance information. The efficacy of this technique was tested through experimental evaluation and satisfactory performance was achieved. This segmentation method has been demonstrated to perform efficiently for utterances separated with short pauses. Potential applications for lip-reading technologies include human computer interface (HCI) for mobility-impaired users, defense applications that require voice-less communication, lip-reading mobile phones, invehicle systems, and improvement of speech-based computer control in noisy environments.
منابع مشابه
Block-Based Motion Estimation Analysis for Lip Reading User Authentication Systems
This paper proposes a lip reading technique for speech recognition by using motion estimation analysis. The method described in this paper represents a sub-system of the Silent Pass project. Silent Pass is a lip reading password entry system for security applications. It presents a user authentication system based on password lip reading. Motion estimation is done for lip movement image sequenc...
متن کاملLip segmentation using automatic selected initial contours based on localized active contour model
With the rapid development of artificial intelligence and the increasing popularity of smart devices, human-computer interaction technology has become a multimedia and multimode technology from being computer-focused to people-centered. Among all ways of human-computer interactions, using language to interact with machines is the most convenient and efficient one. However, the performance of au...
متن کامللبخوانی: روش جدید احراز هویت در برنامههای کاربردی گوشیهای تلفن همراه اندروید
Today, mobile phones are one of the first instruments every individual person interacts with. There are lots of mobile applications used by people to achieve their goals. One of the most-used applications is mobile banks. Security in m-bank applications is very important, therefore modern methods of authentication is required. Most of m-bank applications use text passwords which can be stolen b...
متن کاملDictionary-Based Lip Reading Classification
Visual lip reading recognition is an essential stage in many multimedia systems such as “Audio Visual Speech Recognition” [6], “Mobile Phone Visual System for deaf people”, “Sign Language Recognition System”, etc. The use of lip visual features to help audio or hand recognition is appropriate because this information is robust to acoustic noise. In this paper, we describe our work towards devel...
متن کاملAn Improved Motion Vector Estimation Approach for Video Error Concealment Based on the Video Scene Analysis
In order to enhance the accuracy of the motion vector (MV) estimation and also reduce the error propagation issue during the estimation, in this paper, a new adaptive error concealment (EC) approach is proposed based on the information extracted from the video scene. In this regard, the motion information of the video scene around the degraded MB is first analyzed to estimate the motion type of...
متن کامل