The challenge of multispeaker lip-reading

نویسندگان

  • Stephen J. Cox
  • Richard Harvey
  • Yuxuan Lan
  • Jacob L. Newman
  • Barry-John Theobald
چکیده

In speech recognition, the problem of speaker variability has been well studied. Common approaches to dealing with it include normalising for a speaker’s vocal tract length and learning a linear transform that moves the speaker-independent models closer to to a new speaker. In pure lip-reading (no audio) the problem has been less well studied. Results are often presented that are based on speaker-dependent (single speaker) or multispeaker (speakers in the test-set are also in the training-set) data, situations that are of limited use in real applications. This paper shows the danger of not using different speakers in the trainingand test-sets. Firstly, we present classification results on a new single-word database AVletters 2 which is a high-definition version of the well known AVletters database. By careful choice of features, we show that it is possible for the performance of visual-only lip-reading to be very close to that of audio-only recognition for the single speaker and multi-speaker configurations. However, in the speaker independent configuration, the performance of the visual-only channel degrades dramatically. By applying multidimensional scaling (MDS) to both the audio features and visual features, we demonstrate that lip-reading visual features, when compared with the MFCCs commonly used for audio speech recognition, have inherently small variation within a single speaker across all classes spoken. However, visual features are highly sensitive to the identity of the speaker, whereas audio features are relatively invariant.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods

For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...

متن کامل

لب‌خوانی و ادراک گفتار دانش‌آموزان کم‌شنوای مدارس ویژۀ کم‌شنوایان در شهر تهران

Objective: The goal of this study was to evaluate the lip reading ability and Speech perception of hearing impaired students of special schools for the hearing impaired in different speech levels. Materials & Methods: In this cross- sectional study, 44 deaf students (9-12 years old) were selected with multi-stage cluster sampling method, from two special schools for the deaf in Tehran. Tools...

متن کامل

لب‌خوانی: روش جدید احراز هویت در برنامه‌های کاربردی گوشی‌های تلفن همراه اندروید

Today, mobile phones are one of the first instruments every individual person interacts with. There are lots of mobile applications used by people to achieve their goals. One of the most-used applications is mobile banks. Security in m-bank applications is very important, therefore modern methods of authentication is required. Most of m-bank applications use text passwords which can be stolen b...

متن کامل

Shape Feature Analysis for Visual Speech and Speaker Recognition

Visual information is always combined as a complementary source to enhance the understanding of what the speaker is talking about, especially in a noisy environment. This paper researches on different lip features for visual speech and speaker recognition, and their robustness to different uttering habits is conducted in-depth analysis. Five feature candidates extracted from lip shape are teste...

متن کامل

Evaluation of Receptive and Expressive Vocabulary in 6-18 Month’s-old Children With Cleft Lip and Palate

Objectives: One of the factors predicting language impairments is an early limited lexicon in children. An early limited lexicon can also lead to limited performances in other language areas. This study was aimed to examine receptive and expressive vocabulary in 8-16 month-old children with cleft lip and palate as a predictor of development in other language areas. Materials: The MacArthur-Bat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008