Damped oscillator cepstral coefficients for robust speech recognition
نویسندگان
چکیده
This paper presents a new signal-processing technique motivated by the physiology of human auditory system. In this approach, auditory hair cells are modeled as damped oscillators that are stimulated by bandlimited time domain speech signals acting as forcing functions. Oscillation synchrony is induced by time aligning and three-way coupling of the forcing functions across the individual bands such that a given oscillator is induced not only by its critical band’s forcing function but also by its two neighboring functions. We present two separate features; one which uses the damped oscillator response to the forcing functions without synchrony which we name as the Damped Oscillator Cepstral Coefficient (DOCC) and the other which uses the damped oscillator response to a time synchronized forcing function and we name it as the Synchronized Damped Oscillator Cepstral Coefficient (SyDOCC). The proposed features are used in an Aurora4 noiseand channel-degraded speech recognition task, and the results indicate that they improved speech-recognition performance in all conditions compared to the baseline melcepstral feature and other published noise robust features.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملRegularized minimum variance distortionless response-based cepstral features for robust continuous speech recognition
In this paper, we present robust feature extractors that incorporate a regularized minimum variance distortionless response (RMVDR) spectrum estimator instead of the discrete Fourier transform-based direct spectrum estimator, used in many front-ends including the conventional MFCC, to estimate the speech power spectrum. Direct spectrum estimators, e.g., single tapered periodogram, have high var...
متن کاملFeature extraction from higher-lag autocorrelation coefficients for robust speech recognition
In this paper, a feature extraction method that is robust to additive background noise is proposed for automatic speech recognition. Since the background noise corrupts the autocorrelation coefficients of the speech signal mostly at the lowertime lags, while the higher-lag autocorrelation coefficients are least affected, this method discards the lower-lag autocorrelation coefficients and uses o...
متن کاملRelative mel-frequency cepstral coefficients compensation for robust telephone speech recognition
It is a crucial factor to find the robust and simple computation methods for the actual application of telephone speech recognition. In this paper, we propose a new channel compensation method, which uses a RASTA-like band-pass filter on the mel-frequency cepstral coefficients for robust telephone speech recognition. It is shown from the experiments that the proposed method, comparing with the ...
متن کامل