Use of bimodal coherence to resolve the permutation problem in convolutive BSS

نویسندگان

Qingju Liu

Wenwu Wang

Philip J. B. Jackson

چکیده

Recent studies show that facial information contained in visual speech can be helpful for the performance enhancement of audio-only blind source separation (BSS) algorithms. Such information is exploited through the statistical characterization of the coherence between the audio and visual speech using, e.g., a Gaussian mixture model (GMM). In this paper, we present three contributions. With the synchronized features, we propose an adapted expectation maximization (AEM) algorithm to model the audio– visual coherence in the off-line training process. To improve the accuracy of this coherence model, we use a frame selection scheme to discard nonstationary features. Then with the coherence maximization technique, we develop a new sorting method to solve the permutation problem in the frequency domain. We test our algorithm on a multimodal speech database composed of different combinations of vowels and consonants. The experimental results show that our proposed algorithm outperforms traditional audio-only BSS, which confirms the benefit of using visual speech to assist in separation of the audio. & 2011 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Use of Bimodal Coherence to Resolve Spectral Indeterminacy in Convolutive BSS

Recent studies show that visual information contained in visual speech can be helpful for the performance enhancement of audio-only blind source separation (BSS) algorithms. Such information is exploited through the statistical characterisation of the coherence between the audio and visual speech using, e.g. a Gaussian mixture model (GMM). In this paper, we present two new contributions. An ada...

متن کامل

Audio-visual Convolutive Blind Source Separation

We present a novel method for speech separation from their audio mixtures using the audio-visual coherence. It consists of two stages: in the off-line training process, we use the Gaussian mixture model to characterise statistically the audiovisual coherence with features obtained from the training set; at the separation stage, likelihood maximization is performed on the independent component a...

متن کامل

Sparse filter models for solving permutation indeterminacy in convolutive blind source separation

Frequency-domain methods for estimating mixing filters in convolutive blind source separation (BSS) suffer from permutation and scaling indeterminacies in sub-bands. Solving these indeterminacies are critical to such BSS systems. In this paper, we propose to use sparse filter models to tackle the permutation problem. It will be shown that the l1-norm of the filter matrix increases with permutat...

متن کامل

Convolutive Blind Speech Separation using Cross Spectral Density Matrix and Clustering for Resolving Permutation

1 ABSTRACT The problem of separation of audio sources recorded in a real world situation is well established in modern literature. The method to solve this problem is Blind Speech Separation (BSS).The recording environment is usually modeled as convolutive (i.e. number of speech sources should be equal to or less than number of microphone arrays). In this paper, we propose a new frequency domai...

متن کامل

Adaptive Segmentation and Separation of Determined Convolutive Mixtures under Dynamic Conditions

In this paper, we propose a method for blind source separation (BSS) of convolutive audio recordings with short blocks of stationary sources, i.e. dynamically changing source activity but no source movements.It consists of a time-frequency sparseness based localization step to identify segments with stationary sources whose number is equal to the number of microphones. We then use a frequency d...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Signal Processing

دوره 92 شماره

صفحات -

تاریخ انتشار 2012

Use of bimodal coherence to resolve the permutation problem in convolutive BSS

نویسندگان

چکیده

منابع مشابه

Use of Bimodal Coherence to Resolve Spectral Indeterminacy in Convolutive BSS

Audio-visual Convolutive Blind Source Separation

Sparse filter models for solving permutation indeterminacy in convolutive blind source separation

Convolutive Blind Speech Separation using Cross Spectral Density Matrix and Clustering for Resolving Permutation

Adaptive Segmentation and Separation of Determined Convolutive Mixtures under Dynamic Conditions

عنوان ژورنال:

اشتراک گذاری