Maximising audio-visual speech correlation
نویسندگان
چکیده
The aim of this work is to investigate a selection of audio and visual speech features with the aim of finding pairs that maximise audio-visual correlation. Two audio speech features have been used in the analysis filterbank vectors and the first four formant frequencies. Similarly, three visual features have also been considered active appearance model (AAM), 2-D DCT and cross-DCT. From a database of 200 sentences, audio and visual speech features have been extracted and multiple linear regression used to measure the audio-visual correlation. Results reveal filterbank features to exhibit multiple correlation of around R=0.8 to visual features, while formant frequencies show substantially less correlation to visual features – R=0.6 for formants 1 and 2 and less than R=0.4 for formants 3 and 4. The three visual features show almost identical correlation to the audio features, varying in multiple correlation by less than 0.1, even though the methods of visual feature extraction are very different. Measuring the audio-visual correlation within each phoneme and then averaging the correlation across all phonemes showed an increase in correlation to R=0.9.
منابع مشابه
Speech extraction based on ICA and audio-visual coherence
We present a new approach to the source separation problem for multiple speech signals. Using the extra visual information of the speaker’s face, the method aims to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker’s lip movements. We define a statistical model of the joint probability of visual and spectral audio input for quantifying th...
متن کاملExtracting an AV speech source f
We present a new approach to the source separation problem for multiple speech signals. Using the extra visual information of the face speaker, the method aims to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker’s lip movements. We define a statistical model of the joint probability of visual and spectral audio input for quantifying the ...
متن کاملAudio Visual Speech Enhancement
This thesis presents a novel approach to speech enhancement by exploiting the bimodality of speech production and the correlation that exists between audio and visual speech information. An analysis into the correlation of a range of audio and visual features reveals significant correlation to exist between visual speech features and audio filterbank features. The amount of correlation was also...
متن کاملAnalysis of Correlation between Audio and Audio Feature Predic
The aim of this work is to examine the correlation between audio and visual speech features. The motivation is to find visual features that can provide clean audio feature estimates which can be used for speech enhancement when the original audio signal is corrupted by noise. Two audio features (MFCCs and formants) and three visual features (active appearance model, 2-D DCT and cross-DCT) are c...
متن کاملAnalysis of correlation between audio and visual speech features for clean audio feature prediction in noise
The aim of this work is to examine the correlation between audio and visual speech features. The motivation is to find visual features that can provide clean audio feature estimates which can be used for speech enhancement when the original audio signal is corrupted by noise. Two audio features (MFCCs and formants) and three visual features (active appearance model, 2-D DCT and cross-DCT) are c...
متن کامل