Dynamic visual features for audio-visual speaker verification
نویسندگان
چکیده
The cascading appearance-based (CAB) feature extraction technique has established itself as the state of the art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we will demonstrate that the same steps taken to reduce static speaker and environmental information for the visual speech recognition application also provide similar improvements for visual speaker recognition. A further study is conducted comparing synchronous HMM (SHMM) based fusion of CAB visual features and traditional perceptual linear predictive (PLP) acoustic features to show that higher complexity inherit in the SHMM approach does not appear to provide any improvement in the final audio-visual speaker verification system over simpler utterance level score fusion.
منابع مشابه
Multifactor Fusion for Audio-Visual Speaker Recognition
In this paper we propose a multifactor hybrid fusion approach for enhancing security in audio-visual speaker verification. Speaker verification experiments conducted on two audiovisual databases, VidTIMIT and UCBN, show that multifactor hybrid fusion involve a combination feature-level fusion of lip-voice features and face-lip-voice features at score-level is indeed a powerful technique for spe...
متن کاملAudio-visual multilevel fusion for speech and speaker recognition
In this paper we propose a robust audio-visual speech-andspeaker recognition system with liveness checks based on audio-visual fusion of audio-lip motion and depth features. The liveness verification feature added here guards the system against advanced spoofing attempts such as manufactured or replayed videos. For visual features, a new tensor-based representation of lip motion features, extra...
متن کاملCascading appearance-based features for visual speaker verification
The cascading appearance-based (CAB) feature extraction technique has established itself as the state of the art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we w...
متن کاملA New Lip Feature Representation Method for Video-based Bimodal Authentication
As the low-cost video transmission becomes popular, video based bimodal (audio and visual) authentication has great potential in various applications that require access control. It is especially useful for handheld terminals, which are often used under adverse environments, where the signal quality is rather poor. When human voice is used for authentication, one of the most relevant visual fea...
متن کاملA New Approach to Integrate Audio and Visual Features of Speech
This paper presents a novel fused-hidden Markov model (fused-HMM) to integrate the audio and visual features of speech. In this model, audio and visual HMMs built individually are fused together using a general probabilistic fusion method, which is optimal in the maximum entropy sense. Specifically, the fusion method uses the dependencies between the audio hidden states and the visual observati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Speech & Language
دوره 24 شماره
صفحات -
تاریخ انتشار 2010