Cepstrum derived from differentiated power spectrum for robust speech recognition

نویسندگان

  • Jingdong Chen
  • Kuldip K. Paliwal
  • Satoshi Nakamura
چکیده

In this paper, cepstral features derived from the differential power spectrum (DPS) are proposed for improving the robustness of a speech recognizer in presence of background noise. These robust features are computed from the speech signal of a given frame through the following four steps. First, the short-time power spectrum of speech signal is computed from the speech signal through the fast Fourier transform algorithm. Second, DPS is obtained by differentiating the power spectrum with respect to frequency. Third, the magnitude of DPS is projected from linear frequency to the mel scale and smoothed by a filter bank. Finally, the outputs of the filter bank are transformed to cepstral coefficients by the discrete cosine transform after a nonlinear transformation. It is shown that this new feature set can be decomposed as the superposition of the standard cepstrum and its nonlinearly liftered counterpart. While a linear lifter has no effect on the continuous density hidden Markov model based speech recognition, we show that the proposed feature set embedded with a nonlinear liftering transformation is quite effective for robust speech recognition. For this, we conduct a number of speech recognition experiments (including isolated word recognition, connected digits recognition, and large vocabulary continuous speech recognition) in various operating environments and compare the DPS features with the standard mel-frequency cepstral coefficient features used with cepstral mean normalization and spectral subtraction techniques. 2003 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust MFCCs Derived from Differentiated Power Spectrum

The mel-scaled frequency cepstral coefficients (MFCCs) derived from Fourier transform and filter bank analysis are perhaps the most widely used front-ends in state-of-the-art speech recognition systems. One of the major issues with the MFCCs is that they are very sensitive to additive noise. To improve the robustness of speech front-ends with respect to noise, we introduce, in this paper, a new...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Robust Features for Noisy Speech Recognition using MFCC Computation from Magnitude Spectrum of Higher Order Autocorrelation Coefficients

Noise robustness is one of the most challenging problem in automatic speech recognition. The goal of robust feature extraction is to improve the performance of speech recognition in adverse conditions. The mel-scaled frequency cepstral coefficients (MFCCs) derived from Fourier transform and filter bank analysis are perhaps the most widely used front-ends in state-of-the-art speech recognition s...

متن کامل

Robust feature extraction using subband spectral centroid histograms

In this paper we propose a new framework for utilizing frequency information from the short-term power spectrum of speech. Feature extraction is based on the cepstral coefficients derived from the histograms of subband spectral centroids (SSC). Two new feature extraction algorithms are proposed, one based on frequency information alone, and the other which efficiently combines the frequency and...

متن کامل

Robust Feature Vector Set Using Higher Order Autocorrelation Coefficients

In this paper, a feature extraction method that is robust to additive background noise is proposed for automatic speech recognition. Since the background noise corrupts the autocorrelation coefficients of the speech signal mostly at the lower orders, while the higher-order autocorrelation coefficients are least affected, this method discards the lower order autocorrelation coefficients and uses...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Speech Communication

دوره 41  شماره 

صفحات  -

تاریخ انتشار 2003