Voice activity detection in noisy environments
نویسندگان
چکیده
The subject of this paper is robust voice activity detection (VAD) in noisy environments, especially in car environments. We present a comparison between several frame based VAD feature extraction algorithms in combination with different classifiers. Experiments are carried out under equal test conditions using clean speech, clean speech with added car noise and speech recorded in car environments. The lowest error rate is achieved applying features based on a likelihood ratio test which assumes normal distribution of speech and noise and a perceptron classifier. We propose modifications of this algorithm which reduce the frame error rate by approximately 30% relative in our experiments compared to the original algorithm.
منابع مشابه
Real-time audio-visual voice activity detection for speech recognition in noisy environments
Voice activity detection (VAD) is one of the most critical issues on performance degradation of speech recognition in noisy environment applications. A real-time VAD was developed by using face parameters (eye and lip contours) as a front-end for the traditional speech and noise (audio) GMMbased method. Speech recognition performance of the audiovisual VAD is shown to be comparable with audio-o...
متن کاملTwo-layered audio-visual integration in voice activity detection and automatic speech recognition for robots
Automatic Speech Recognition (ASR) which plays an important role in human-robot interaction should be noise-robust because robots are expected to work in noisy environments. Audio-Visual (AV) integration is one of the key ideas to improve the robustness in such environments. This paper proposes two-layered AV integration for ASR which applies AV integration to Voice Activity Detection (VAD) and...
متن کاملNoise robust voice activity detection based on switching kalman filter
This paper addresses the problem of voice activity detection (VAD) in noisy environments. The VAD method proposed in this paper is based on a statistical model approach, and estimates statistical models sequentially without a priori knowledge of noise. Namely, the proposed method constructs a clean speech / silence state transition model beforehand, and sequentially adapts the model to the nois...
متن کاملVoice activity detection using modified Wigner-ville distribution
This paper introduces a new method of voice activity detection (VAD) using modified Wigner-Ville distribution. Modified Wigner-Ville distribution can track the speech regions efficiently in noisy environments. Fourier Transform (FT) and Discrete Cosine Transform (DCT) variants of Wigner-Ville distribution based, voice activity detection schemes are presented. The efficient time frequency spectr...
متن کاملRobust voice activity detection using perceptual wavelet-packet transform and Teager energy operator
In this letter, a robust voice activity detection (VAD) algorithm is presented. This proposed VAD algorithm makes use of the perceptual wavelet-packet transform and the Teager energy operator to compute a robust parameter called voice activity shape for VAD. The main advantage of this algorithm is that the preset threshold values or a priori knowledge of the SNR usually needed in conventional V...
متن کامل