Multi-band long-term signal variability features for robust voice activity detection
نویسندگان
چکیده
In this paper, we propose robust features for the problem of voice activity detection (VAD). In particular, we extend the long term signal variability (LTSV) feature to accommodate multiple spectral bands. The motivation of the multi-band approach stems from the non-uniform frequency scale of speech phonemes and noise characteristics. Our analysis shows that the multi-band approach offers advantages over the single band LTSV for voice activity detection. In terms of classification accuracy, we show 0.3%-61.2% relative improvement over the best accuracy of the baselines considered for 7 out 8 different noisy channels. Experimental results, and error analysis, are reported on the DARPA RATS corpora of noisy speech.
منابع مشابه
Efficient voice activity detection algorithm using long-term spectral flatness measure
This paper proposes a novel and robust voice activity detection (VAD) algorithm utilizing long-term spectral flatness measure (LSFM) which is capable of working at 10 dB and lower signal-to-noise ratios(SNRs). This new LSFM-based VAD improves speech detection robustness in various noisy environments by employing a low-variance spectrum estimate and an adaptive threshold. The discriminative powe...
متن کاملA Novel Voice Sensor for the Detection of Speech Signals
In order to develop a novel voice sensor to detect human voices, the use of features which are more robust to noise is an important issue. Voice sensor is also called voice activity detection (VAD). Due to that the inherent nature of the formant structure only occurred on the speech spectrogram (well-known as voiceprint), Wu et al. were the first to use band-spectral entropy (BSE) to describe t...
متن کاملPhone-duration-dependent long-term dynamic features for a stochastic model-based voice activity detection
Accurate voice activity detection (VAD) is important for robust automatic speech recognition (ASR) systems. This paper proposes noise-robust VAD using long-term temporal information in speech. Long-term temporal information has been an ASR focus recently, but has not been investigated sufficiently for VAD. This paper describes an attempt to incorporate long-term temporal information into a feat...
متن کاملA robust frontend for VAD: exploiting contextual, discriminative and spectral cues of human voice
Reliable automatic detection of speech/non-speech activity in degraded, noisy audio signals is a fundamental and challenging task in robust signal processing. As various speech technology applications rely on the accuracy of a Voice Activity Detection (VAD) system for their effectiveness and robustness, the problem has gained considerable research interest over the years. It has been shown that...
متن کاملA New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کامل