Second order statistics spectrum estimation method for robust speech recognition
نویسندگان
چکیده
A second order statistics spectrum estimation (SOSSE) method for speech enhancement is presented. DFT amplitude spectral components of noisy signal are assumed to be random values. Upon first and second order statistic values estimation of noise-only spectrum, an enhancement of noisy signal spectrum was performed. As a reference, a fast discrete cosine transform based signal subspace (FDCTSS) method was realized. The Aurora 2 database of digit sequences was used, to show methods effectiveness in improvement of speech recognition. Both methods proved well under clean training condition. The total relative improvements of 30.75% (SOSSE) and 26.31% (FDCSS) in recognition accuracy were achieved. When the multi-condition training was done the proposed SOSSE method outperformed FDCTSS method. The total relative improvements of 17.50% (SOSSE) and-4.53% (FDCTSS) were achieved.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملRobust Feature Vector Set Using Higher Order Autocorrelation Coefficients
In this paper, a feature extraction method that is robust to additive background noise is proposed for automatic speech recognition. Since the background noise corrupts the autocorrelation coefficients of the speech signal mostly at the lower orders, while the higher-order autocorrelation coefficients are least affected, this method discards the lower order autocorrelation coefficients and uses...
متن کاملA Study of Low-variance Multi-taper Features for Distributed Speech Recognition
In this paper we study low-variance multi-taper spectrum estimation methods to compute the mel-frequency cepstral coefficient (MFCC) features for robust speech recognition. In speech recognition, MFCC features are usually computed from a Hamming-windowed DFT spectrum. Although windowing helps in reducing the bias of the spectrum, but variance remains high. Multitaper spectrum estimation methods...
متن کاملRegularized MVDR spectrum estimation-based robust feature extractors for speech recognition
In this paper, we present two robust feature extractors that use a regularized minimum variance distortionless response (RMVDR) spectrum estimator instead of the discrete Fourier transform-based direct spectrum estimator, used in many front-ends including the conventional MFCC, for estimating the speech power spectrum. Direct spectrum estimators, e.g., single tapered periodogram, have high vari...
متن کاملNoise spectrum estimation using Gaussian mixture model-based speech presence probability for robust speech recognition
This work presents a noise spectrum estimator based on the Gaussian mixture model (GMM)-based speech presence probability (SPP) for robust speech recognition. Estimated noise spectrum is then used to compute a subband a posteriori signal-to-noise ratio (SNR). A sigmoid shape weighting rule is formed based on this subband a posteriori SNR to enhance the speech spectrum in the auditory domain, wh...
متن کامل