Robust Speech Recognition in a Car Using a Microphone Array
نویسنده
چکیده
Performance of automatic speech recognition relies on a vast amount of training speech data mostly recorded with little or no background noise. The performance degrades significantly with existence of background noise, which increases type mismatch between train and test environments. Speech enhancement techniques can reduce the amount of type mismatch by extracting reliable speech features from the background noise. At very low SNR with nonstationary noise, the enhanced speech may still contain significant noise either in noise-only segments or speech segments. The former masquerade as nonexistent speech features and the latter as distorted speech features. Both significantly degrade the performance of the automatic speech recognizer. This encourages the use of voice activity detection (VAD) algorithms to determine parts with speech present. To use only the reliable speech features, we need to further determine whether the features from the speech region are mainly from speech or from nonstationary noises masking the speech. For more robust speech recognition, this thesis proposes a three-hypothesis VAD consisting of H0: noise-only region; HS : speech-dominant speech region; and HN : noise-dominant speech region. Spectrum-based VAD uses knowledge of the noise spectrum to detect voice activity using the nonstationary nature of speech. This thesis proposes a method of estimating the instantaneous noise spectrum for VAD at low SNR when the noise estimation becomes inaccurate due to its high variance. The spectrum-based VAD, however, cannot distinguish speech from nonstationary noise because both appear nonstationary to the VAD, and thus look like speech. A microphone array can determine the noise-corrupted speech region when the nonstationary
منابع مشابه
Noise Reduction Using an Adaptive Microphone Array in a Car - A Speech Recognition Evaluation
TIGS paper describes an evaluation of an adaptive microphone array with respect to speech recognition performauce in a tar. The microphone array is compared KO two conventional microphones of different rypes. The speech recognition device is aimed to be a part of a man/maehine-interface between the driver and tar information services.
متن کاملAutomatic Speech Recognition for Car Kits Using a Microphone Array
In this paper we present a novel solutions for microphone array based car kit systems that intelligently uses the multipath environment to enhance signal coming from a desired location. Our solution requires a low computational load, and can be deployed on most of the platforms. We present speech recognition rates on real data, and compare a stereo versus a mono solution on this database.
متن کاملMicrophone array design for robust speech acquisition and recognition
The aim of this paper is to study the use of a robust acquisition system based on a microphone array for speech related applications in real situations. A comparison is performed between two beamforming methods: the Delay and Sum beamforming (DS) and the Spatial Reference Optimal beamforming (SRO). Both of them are frequency domain designed, using harmonic spatial distributed microphones. The q...
متن کاملAnalysis of CFA-BF: Novel combined fixed/adaptive beamforming for robust speech recognition in real car environments
Among a number of studies which have investigated various speech enhancement and processing schemes for in-vehicle speech systems, the delay-and-sum beamforming (DASB) and adaptive beamforming are two typical methods that both have their advantages and disadvantages. In this paper, we propose a novel combined fixed/adaptive beamforming solution (CFA-BF) based on previous work for speech enhance...
متن کاملContinuous Microphone Array Speech Recognition on Wall Street Journal Corpus
In this paper, we present a robust speech acquisition system to acquire continuous speech using a microphone array. A microphone array based speech recognition system is also presented to study the environmental interference due to reverberation, background noises and mismatch between the training and testing conditions. This is important in the context of smart meeting rooms of Augmented Multi...
متن کامل