Robust speech recognition based on spectro-temporal processing
نویسنده
چکیده
A major deficiency in state-of-the-art automatic speech recognition (ASR) systems is the lack of robustness in additive and convolutional noise. The model of auditory perception (PEMO), developed by Dau et al. (1996a) for psychoacoustical purposes, partly overcomes these difficulties when used as a front end for automatic speech recognition. To further improve the performance of this auditory-based recognition system in background noise, different speech enhancement methods were examined, which have been A slightly modified version of this chapter was published in Speech Communication (34) 1–2, pp.75–91 (2001) by Michael Kleinschmidt, Jürgen Tchorz and Birger Kollmeier.
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملSpectro-temporal directional derivative features for automatic speech recognition
We introduce a novel spectro-temporal representation of speech by applying directional derivative filters to the Melspectrogram, with the aim of improving the robustness of automatic speech recognition. Previous studies have shown that two-dimensional wavelet functions, when tuned to appropriate spectral scales and temporal rates, are able to accurately capture the acoustic modulations of speec...
متن کاملSeparable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.
To test if simultaneous spectral and temporal processing is required to extract robust features for automatic speech recognition (ASR), the robust spectro-temporal two-dimensional-Gabor filter bank (GBFB) front-end from Schädler, Meyer, and Kollmeier [J. Acoust. Soc. Am. 131, 4134-4151 (2012)] was de-composed into a spectral one-dimensional-Gabor filter bank and a temporal one-dimensional-Gabor...
متن کاملSpectro-temporal Modulations for Robust Speech Emotion Recognition Spectro-temporal Modulations for Robust Speech Emotion Recognition
متن کامل
Informative spectro-temporal bottleneck features for noise-robust speech recognition
Spectro-temporal Gabor features based on auditory knowledge have improved word accuracy for automatic speech recognition in the presence of noise. In previous work, we generated robust spectro-temporal features that incorporated the power normalized cepstral coefficient (PNCC) algorithm. The corresponding power normalized spectrum (PNS) is then processed by many Gabor filters, yielding a high d...
متن کامل