Integration of Face and Voice Recognition
ثبت نشده
چکیده
cepstral features and features based on a bio–mechanical model of the visible articulators will be the identity–carrying characteristics extracted from acoustic speech and visual speech respectively. Speakers will be modelled by multi–layer perceptrons trained as discriminative models or, alternatively, as predictive models. In the discriminative modelling scheme, each speaker model will be trained to recognize its allocated speaker directly from his acoustic and visual speech. A measure, based on the cross–correlation between the motion of visible articulators and the acoustic speech, will be used for detecting the impostors who would use facial images and voice not originating from the same person (e.g. by using tape–recorded voice and mimicking the movement of visible articulators). The predictive modelling scheme is based on the belief that acoustic and visual speech are cross–correlated. Hence, one may be predicted from the other. Another assumption is that the mapping from acoustic speech to visual speech is speaker–specific; hence, each speaker will be modelled by an MLP trained to perform acoustic–to–visual speech mappings (prediction) for his speech. The prediction error for each speaker model will then act as the recognition measure. The lower the error, the better the model fits the given acoustic and visual speech. CONCLUSION Preliminary investigations have shown that person recognition accuracy can be improved by the joint use of vocal and facial information. It remains to be seen whether acoustic speech used in conjunction with visual speech will also yield improved recognition accuracy.
منابع مشابه
Benefits for Voice Learning Caused by Concurrent Faces Develop over Time.
Recognition of personally familiar voices benefits from the concurrent presentation of the corresponding speakers' faces. This effect of audiovisual integration is most pronounced for voices combined with dynamic articulating faces. However, it is unclear if learning unfamiliar voices also benefits from audiovisual face-voice integration or, alternatively, is hampered by attentional capture of ...
متن کاملImplementation of Face Recognition Algorithm on Fields Programmable Gate Array Card
The evolution of today's application technologies requires a certain level of robustness, reliability and ease of integration. We choose the Fields Programmable Gate Array (FPGA) hardware description language to implement the facial recognition algorithm based on "Eigen faces" using Principal Component Analysis. In this paper, we first present an overview of the PCA used for facial recognition,...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملEffect of Sensor Fusion for Recognition of Emotional States Using Voice, Face Image and Thermal Image of Face
A new integration method is presented to recognize the emotional expressions of human. We attempt to use both voices and facial expressions. For voices, we use such prosodic parameters as pitch signals, energy, and their derivatives, which are trained by Hidden Markov Model (HMM) for recognition. For facial expressions, we use feature parameters from thermal images in addition to visible images...
متن کاملFunctional Connectivity between Face-Movement and Speech-Intelligibility Areas during Auditory-Only Speech Perception
It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of function...
متن کاملFamiliar face and voice matching and recognition in children with autism.
Relatively able children with autism were compared with age- and language-matched controls on assessments of (1) familiar voice-face identity matching, (2) familiar face recognition, and (3) familiar voice recognition. The faces and voices of individuals at the children's schools were used as stimuli. The experimental group were impaired relative to the controls on all three tasks. Face recogni...
متن کامل