Multifactor Fusion for Audio-Visual Speaker Recognition
نویسنده
چکیده
In this paper we propose a multifactor hybrid fusion approach for enhancing security in audio-visual speaker verification. Speaker verification experiments conducted on two audiovisual databases, VidTIMIT and UCBN, show that multifactor hybrid fusion involve a combination feature-level fusion of lip-voice features and face-lip-voice features at score-level is indeed a powerful technique for speaker identity verification, as it preserves synchronisation . of the closely coupled modalities, such as face, voice and lip dynamics of a speaker during speech, through various stages of authentication. An improvement in error rate of the order of 22-36% is achieved for experiments by using feature level fusion of acoustic and visual feature vectors from lip region as compared to classical late fusion approach. Key-Words: Audio-visual, Multifactor, Hybrid Fusion, Speaker recognition, Impostor attacks
منابع مشابه
An Examination of Audio-visual Fused Hmms for Speaker Recognition
Fused hidden Markov models (FHMMs) have been shown to work well for the task of audio-visual speaker recognition, but only in an output decision-fusion configuration of both the audioand video-biased versions of the FHMM structure. This paper looks at the performance of the audioand video-biased versions independently, and shows that the audio-biased version is considerably more capable for spe...
متن کاملWeight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification
This paper investigates the estimation of fusion weights under varying acoustic noise conditions for audio-visual multi-level hybrid fusion strategy in speaker identification. The multi-level fusion combines model level and decision level fusion via dynamic Bayesian networks (DBNs). A novel methodology known as support vector regression (SVR) is utilized to estimate the fusion weights directly ...
متن کاملAudio-visual multilevel fusion for speech and speaker recognition
In this paper we propose a robust audio-visual speech-andspeaker recognition system with liveness checks based on audio-visual fusion of audio-lip motion and depth features. The liveness verification feature added here guards the system against advanced spoofing attempts such as manufactured or replayed videos. For visual features, a new tensor-based representation of lip motion features, extra...
متن کاملA New Approach to Integrate Audio and Visual Features of Speech
This paper presents a novel fused-hidden Markov model (fused-HMM) to integrate the audio and visual features of speech. In this model, audio and visual HMMs built individually are fused together using a general probabilistic fusion method, which is optimal in the maximum entropy sense. Specifically, the fusion method uses the dependencies between the audio hidden states and the visual observati...
متن کاملMulti-level Fusion of Audio and Visual Features for Speaker Identification
This paper explores the fusion of audio and visual evidences through a multi-level hybrid fusion architecture based on dynamic Bayesian network (DBN), which combines model level and decision level fusion to achieve higher performance. In model level fusion, a new audio-visual correlative model (AVCM) based on DBN is proposed, which describes both the intercorrelations and loose timing synchroni...
متن کامل