Audiovisual-based adaptive speaker identification
نویسندگان
چکیده
An adaptive speaker identification system is presented in this paper, which aims to recognize speakers in feature films by exploiting both audio and visual cues. Specifically, the audio source is first analyzed to identify speakers using a likelihood-based approach. Meanwhile, the visual source is parsed to recognize talking faces using face detection/recognition and mouth tracking techniques. These two information sources are then integrated under a probabilistic framework for improved system performance. Moreover, to account for speakers’ voice variations along time, we update their acoustic models on the fly by adapting to their newly contributed speech data. An average of 80% identification accuracy has been achieved on two test movies. This shows a promising future of the proposed audiovisual-based adaptive speaker identification approach.
منابع مشابه
Adaptive speaker identification with audiovisual cues for movie content analysis
An adaptive speaker identification system which employs both audio and visual cues is proposed in this work for movie content analysis. Specifically, a likelihood-based approach is first applied for speaker identification using pure speech data, and techniques such as face detection/recognition and mouth tracking are applied for talking face recognition using pure visual data. These two informa...
متن کاملLikelihood Ratio Based Score Fusion for Audio-Visual Speaker Identification in Challenging Environment
It is well known to enhance the performance of noise robust speaker identification using visual speech information with audio utterances. This paper presents an approach to evaluate the performance of a noise robust audio-visual speaker identification system using likelihood ratio based score fusion in challenging environment. Though the traditional HMM based audio-visual speaker identification...
متن کاملAudio-Visual Correlation Modeling for Speaker Identification and Synthesis
This thesis addresses two major problems of multimodal signal processing using audiovisual correlation modeling: speaker recognition and speaker synthesis. We address the first problem, i.e., the audiovisual speaker recognition problem within an open-set identification framework, where audio (speech) and lip texture (intensity) modalities are fused employing a combination of early and late inte...
متن کاملMulti-level Fusion of Audio and Visual Features for Speaker Identification
This paper explores the fusion of audio and visual evidences through a multi-level hybrid fusion architecture based on dynamic Bayesian network (DBN), which combines model level and decision level fusion to achieve higher performance. In model level fusion, a new audio-visual correlative model (AVCM) based on DBN is proposed, which describes both the intercorrelations and loose timing synchroni...
متن کاملAudiovisual speaker identity verification based on lip motion features
In this paper, we propose the fusion of audio and explicit lip motion features for speaker identity verification applications. Experimental results using GMM-based speaker models indicate that audiovisual fusion with explicit lip motion information provides significant performance improvement for verifying both the speaker identity and the liveness, due to tracking of the closely coupled acoust...
متن کامل