Use of bimodal coherence to resolve the permutation problem in convolutive BSS
نویسندگان
چکیده
Recent studies show that facial information contained in visual speech can be helpful for the performance enhancement of audio-only blind source separation (BSS) algorithms. Such information is exploited through the statistical characterization of the coherence between the audio and visual speech using, e.g., a Gaussian mixture model (GMM). In this paper, we present three contributions. With the synchronized features, we propose an adapted expectation maximization (AEM) algorithm to model the audio– visual coherence in the off-line training process. To improve the accuracy of this coherence model, we use a frame selection scheme to discard nonstationary features. Then with the coherence maximization technique, we develop a new sorting method to solve the permutation problem in the frequency domain. We test our algorithm on a multimodal speech database composed of different combinations of vowels and consonants. The experimental results show that our proposed algorithm outperforms traditional audio-only BSS, which confirms the benefit of using visual speech to assist in separation of the audio. & 2011 Elsevier B.V. All rights reserved.
منابع مشابه
Use of Bimodal Coherence to Resolve Spectral Indeterminacy in Convolutive BSS
Recent studies show that visual information contained in visual speech can be helpful for the performance enhancement of audio-only blind source separation (BSS) algorithms. Such information is exploited through the statistical characterisation of the coherence between the audio and visual speech using, e.g. a Gaussian mixture model (GMM). In this paper, we present two new contributions. An ada...
متن کاملAudio-visual Convolutive Blind Source Separation
We present a novel method for speech separation from their audio mixtures using the audio-visual coherence. It consists of two stages: in the off-line training process, we use the Gaussian mixture model to characterise statistically the audiovisual coherence with features obtained from the training set; at the separation stage, likelihood maximization is performed on the independent component a...
متن کاملSparse filter models for solving permutation indeterminacy in convolutive blind source separation
Frequency-domain methods for estimating mixing filters in convolutive blind source separation (BSS) suffer from permutation and scaling indeterminacies in sub-bands. Solving these indeterminacies are critical to such BSS systems. In this paper, we propose to use sparse filter models to tackle the permutation problem. It will be shown that the l1-norm of the filter matrix increases with permutat...
متن کاملConvolutive Blind Speech Separation using Cross Spectral Density Matrix and Clustering for Resolving Permutation
1 ABSTRACT The problem of separation of audio sources recorded in a real world situation is well established in modern literature. The method to solve this problem is Blind Speech Separation (BSS).The recording environment is usually modeled as convolutive (i.e. number of speech sources should be equal to or less than number of microphone arrays). In this paper, we propose a new frequency domai...
متن کاملAdaptive Segmentation and Separation of Determined Convolutive Mixtures under Dynamic Conditions
In this paper, we propose a method for blind source separation (BSS) of convolutive audio recordings with short blocks of stationary sources, i.e. dynamically changing source activity but no source movements.It consists of a time-frequency sparseness based localization step to identify segments with stationary sources whose number is equal to the number of microphones. We then use a frequency d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Signal Processing
دوره 92 شماره
صفحات -
تاریخ انتشار 2012