Real-Time Prototype for Integration of Blind Source Extraction and Robust Automatic Speech Recognition
نویسندگان
چکیده
This demo presents a real-time prototype for automatic blind source extraction and speech recognition in presence of multiple interfering noise sources. Binaural recorded mixtures are processed by a combined Blind/Semi-Blind Source Separation algorithm in order to obtain an estimation of the target signal. The recovered target signal is segmented and used as input to a real-time automatic speech recognition (ASR) system. Further, to improve the recognition performance, noise robust features based on Gammatone filters are used. The demo utilizes the data provided for the CHiME Pascal speech separation and recognition challenge and also real-time mixtures recorded onsite. Users will be able to listen to the recovered target signal and compare it with the original mixture and ASR output.
منابع مشابه
A Two-Channel Acoustic Front-End for Robust Automatic Speech Recognition in Noisy and Reverberant Environments
An acoustic front-end for robust automatic speech recognition in noisy and reverberant environments is proposed in this contribution. It comprises a blind source separation-based signal extraction scheme and only requires two microphone signals. The proposed front-end and its integration into the recognition system is analyzed and evaluated in noisy living room-like environments according to th...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملA Flexible Spatial Blind Source Extraction Framework for Robust Speech Recognition in Noisy Environments
Blind source extraction (BSE) is an attractive approach to enhance multichannel noisy speech data, as a preprocessing step for an automatic speech recognition system. BSE was successfully applied to the first Chime Pascal Challenge for improving the recognition rate of noisy commands in a small dictionary task. In this work we reviewed the BSE architecture and improved each system block in the ...
متن کاملNonverbal Communication in Spontaneous Speech Recognition
Verbal communication is the most obvious instrument used to express our thoughts and ideas, considering only this part of speech without regarding its nonverbal part, may lead to overlooking important information of utterance or even misunderstanding it. The development of an automatic system for recognition of facial expressions is a rather difficult task. Such a system must perform automatica...
متن کاملStereo-input speech recognition using sparseness-based time-frequency masking in a reverberant environment
We present noise robust automatic speech recognition (ASR) using sparseness-based underdetermined blind source separation (BSS) technique. As a representative underdetermined BSS method, we utilized time-frequency masking in this paper. Although time-frequency masking is able to separate target speech from interferences effectively, one should consider two problems. One is that masking does not...
متن کامل