Average instantaneous frequency (AIF) and average log-envelopes (ALE) for ASR with the Aurora 2 database
نویسندگان
چکیده
We have developed a novel approach to speech feature extraction based on a modulation model of a band-pass signal. Speech is processed by a bank of band-pass filters. At the output of the band-pass filters the signal is subjected to a log-derivative operation which naturally decomposes the band-pass signal into analytic (called ) and anti-analytic (called ) components. The average instantaneous frequency (AIF) and average log-envelope (ALE) are then extracted as coarse features at the output of each filter. Further, refined features may also be extracted from the analytic and anti-analytic components (but not done in this paper). We then evaluated the Aurora 2 task where noise corruption is synthetic. For clean training, (compared to the mel-cepstrum front end, with 3 mixture HMM back-end,) our AIF/ALE front end achieves an average improvement of with set A and improvement with set B and (negative) ‘improvement’ with set C. The overall improvement in accuracy rates for clean training is . Although the improvements are modest, the novelty of the front-end and its potential for future enhancements are our strengths.
منابع مشابه
A generalized framework for compensation of mel-filterbank outputs in feature extraction for robust ASR
This paper describes a novel and efficient noise-robust frontend that utilizes a set of Mel-filterbank output compensation methods, together with cumulative distribution mapping of cepstral coefficients, for noisy speech recognition. The proposed compensation framework includes the use of noise spectral subtraction, spectral flooring and log Mel-filterbank output weighting. Recognition experime...
متن کاملJoint Bayesian predictive classification and parallel model combination with prior scaling for robust ASR
This paper presents a model compensation approach based on Bayesian predictive classification (BPC). In order to obtain effective prior distributions for BPC, our approach uses parallel model combination (PMC) to set the prior mean, and a likelihood ratio to set a scaled frame-specific prior variance. Experiments on the Aurora 2 database show that the proposed approach results in improved avera...
متن کاملA Log-energy Scaling Normalization Scheme for Robust Speech Recognition
The log-energy parameter, as an auxiliary but influential feature, has been commonly used to augment Mel-frequency cepstral coefficients (MFCCs) to improve the recognition accuracy in automatic speech recognition (ASR). In this paper, a new and effective scaling approach named log-energy scaling normalization (LESN), which utilizes special nonlinear scaling functions on noisy speech data for lo...
متن کاملOn compensating the Mel-frequency cepstral coefficients for noisy speech recognition
This paper describes a novel noise-robust automatic speech recognition (ASR) front-end that employs a combination of Mel-filterbank output compensation and cumulative distribution mapping of cepstral coefficients with truncated Gaussian distribution. Recognition experiments on the Aurora II connected digits database reveal that the proposed front-end achieves an average digit recognition accura...
متن کاملAn MTF-based blind restoration of temporal power envelopes as a front-end processor for automatic speech recognition systems in reverberant environments
To reduce speech degradation in reverberant environments, we previously proposed a modulation transfer function (MTF) based method of speech restoration. The room impulse response (RIR) in this restoration does not need to be measured at any time since we modeled the power envelope of the RIRs as an exponential decay function. Speech is assumed to be temporal modulated with white noise carrier ...
متن کامل