Effective visually-derived Wiener filtering for audio-visual speech processing
نویسندگان
چکیده
This work presents a novel approach to speech enhancement by exploiting the bimodality of speech and the correlation that exists between audio and visual speech features. For speech enhancement, a visually-derived Wiener filter is developed. This obtains clean speech statistics from visual features by modelling their joint density and making a maximum a posteriori estimate of clean audio from visual speech features. Noise statistics for the Wiener filter utilise an audio-visual voice activity detector which classifies input audio as speech or nonspeech, enabling a noise model to be updated. Analysis shows estimation of speech and noise statistics to be effective with speech quality assessed objectively and subjectively measuring the effectiveness of the resulting Wiener filter. The use of this enhancement method is also considered for ASR purposes.
منابع مشابه
Enhancing audio speech using visual speech features
This work presents a novel approach to speech enhancement by exploiting the bimodality of speech and the correlation that exists between audio and visual speech features. For speech enhancement, a visually-derived Wiener filter is developed. This obtains clean speech statistics from visual features by modelling their joint density and making a maximum a posteriori estimate of clean audio from v...
متن کاملNoisy audio speech enhancement using Wiener filters derived from visual speech
The aim of this paper is to use visual speech information to create Wiener filters for audio speech enhancement. Wiener filters require estimates of both clean speech statistics and noisy speech statistics. Noisy speech statistics are obtained from the noisy input audio while obtaining clean speech statistics is more difficult and is a major problem in the creation of Wiener filters for speech ...
متن کاملAudio Visual Speech Enhancement
This thesis presents a novel approach to speech enhancement by exploiting the bimodality of speech production and the correlation that exists between audio and visual speech information. An analysis into the correlation of a range of audio and visual features reveals significant correlation to exist between visual speech features and audio filterbank features. The amount of correlation was also...
متن کاملSpeech enhancement with an acoustic vector sensor: an effective adaptive beamforming and post-filtering approach
Speech enhancement has an increasing demand in mobile communications and faces a great challenge in a real ambient noisy environment. This paper develops an effective spatialfrequency domain speech enhancement method with a single acoustic vector sensor (AVS) in conjunction with minimum variance distortionless response (MVDR) spatial filtering and Wiener post-filtering (WPF) techniques. In remo...
متن کاملSpeaker separation using visual speech features and single-channel audio
This work proposes a method of single-channel speaker separation that uses visual speech information to extract a target speaker’s speech from a mixture of speakers. The method requires a single audio input and visual features extracted from the mouth region of each speaker in the mixture. The visual information from speakers is used to create a visually-derived Wiener filter. The Wiener filter...
متن کامل