Visual Speech Recognition

نویسنده

  • Ahmad B. A. Hassanat
چکیده

Lip reading is used to understand or interpret speech without hearing it, a technique especially mastered by people with hearing difficulties. The ability to lip read enables a person with a hearing impairment to communicate with others and to engage in social activities, which otherwise would be difficult. Recent advances in the fields of computer vision, pattern recognition, and signal processing has led to a growing interest in automating this challenging task of lip reading. Indeed, automating the human ability to lip read, a process referred to as visual speech recognition (VSR) (or sometimes speech reading), could open the door for other novel related applications. VSR has received a great deal of attention in the last decade for its potential use in applications such as human-computer interaction (HCI), audio-visual speech recognition (AVSR), speaker recognition, talking heads, sign language recognition and video surveillance. Its main aim is to recognise spoken word(s) by using only the visual signal that is produced during speech. Hence, VSR deals with the visual domain of speech and involves image processing, artificial intelligence, object detection, pattern recognition, statistical modelling, etc. There are two different main approaches to the VSR problem, the visemic* approach and the holistic approach, each with its own strengths and weaknesses. The traditional and most common approaches to automatic lip reading are based on visemes. A Viseme is the mouth shapes (or appearances) or sequences of mouth dynamics that are required to generate a phoneme in the visual domain. However, several problems arise while using visemes in visual speech recognition systems such as the low number of visemes (between 10 and 14) compared to phonemes (between 45 and 53). Visemes cover only a small subspace of the mouth motions represented in the visual domain, and many other problems. These problems contribute to the bad performance of the traditional approaches; hence, the visemic approach is something like digitising the signal of the spoken word, and digitising causes a loss of information. The holistic approach such as the “visual words” (Hassanat, 2009) considers the signature of the whole word rather than only parts of it. This approach can provide a good alternative to the visemic approaches to automatic lip reading. The major problem that faces this approach is that for a complete English language lip reading system, we need to train the whole of the English language words in the dictionary! Or to train (at least) the distinct ones. This

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

P65: Speech Recognition Based on Bbrain Signals by the Quantum Support Vector Machine for Inflammatory Patient ALS

People communicate with each other by exchanging verbal and visual expressions. However, paralyzed patients with various neurological diseases such as amyotrophic lateral sclerosis and cerebral ischemia have difficulties in daily communications because they cannot control their body voluntarily. In this context, brain-computer interface (BCI) has been studied as a tool of communication for thes...

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Czech audio-visual speech corpus of a car driver for in-vehicle audio-visual speech recognition

This paper presents the design of an audio-visual speech corpus for in-vehicle audio-visual speech recognition. Throughout the world, there exist several audio-visual speech corpora. There are also several (audio-only) speech corpora for in-vehicle recognition. So far, we have not found an audiovisual speech corpus for in-vehicle speech recognition. And, we have not found any audio-visual speec...

متن کامل

Characteristics of the Use of Coupled Hidden Markov Models for Audio-Visual Polish Speech Recognition

This paper focuses on combining audio-visual signals for Polish speech recognition in conditions of highly disturbed audio speech signal. Recognition of audio-visual speech was based on combined hidden Markov models (CHMM). Described methods where developed for a single isolated command, nevertheless their effectiveness indicated that they would also work similarly in continuous audio-visual sp...

متن کامل

A system for audio-visual speech recognition

In this work, a system of audio visual speech recognition will be presented. A new hybrid visual feature combination, which is suitable for audio -visual speech recognition was implemented. The features comprise both the shape and the appearance of lips, the dimensional reduction is applied using discrete cosine transform (DCT). A large visual speech database of the German language has been ass...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1409.1411  شماره 

صفحات  -

تاریخ انتشار 2012