Comparing Different Flavors of Spectro-Temporal Features for ASR
نویسندگان
چکیده
In the last decade, several studies have shown that the robustness of ASR systems can be increased when 2D Gabor filters are used to extract specific modulation frequencies from the input pattern. This paper analyzes important design parameters for spectro-temporal features based on a Gabor filter bank: We perform experiments with filters that exhibit different phase sensitivity. Further, we analyze if non-linear weighting with a multi-layer perceptron (MLP) and a subsequent concatenation with mel-frequency cepstral coefficients (MFCCs) has beneficial effects. For the Aurora2 noisy digit recognition task, the use of phase sensitive filters improved the MFCC baseline, whereas using filters that neglect phase information did not. While MLP processing alone did not have a large effect on the overall performance, the best results were obtained for MLP-processed phase sensitive filters and added MFCCs, with relative error reductions of over 40% for both noisy and clean training.
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملMethods for capturing spectro-temporal modulations in automatic speech recognition
Psychoacoustical and neurophysiological results indicate that spectro-temporal modulations play an important role in sound perception. Speech signals, in particular, exhibit distinct spectro-temporal patterns which are well matched by receptive fields of cortical neurons. In order to improve the performance of automatic speech recognition (ASR) systems a number of different approaches are prese...
متن کاملSpectro-temporal Gabor features as a front end for automatic speech recognition
A novel type of feature extraction is introduced to be used as a front end for automatic speech recognition (ASR). Two-dimensional Gabor filter functions are applied to a spectro-temporal representation formed by columns of primary feature vectors. The filter shape is motivated by recent findings in neurophysiology and psychoacoustics which revealed sensitivity towards complex spectro-temporal ...
متن کاملNormalization of spectro-temporal Gabor filter bank features for improved robust automatic speech recognition systems
Physiologically motivated feature extraction methods based on 2D-Gabor filters have already been used successfully in robust automatic speech recognition (ASR) systems. Recently it was shown that a Mel Frequency Cepstral Coefficients (MFCC) baseline can be improved with physiologically motivated features extracted by a 2D-Gabor filter bank (GBFB). Besides physiologically inspired approaches to ...
متن کاملUsing spectro-temporal features to improve AFE feature extraction for ASR
Previous work has shown that spectro-temporal features reduce WER for automatic speech recognition under noisy conditions. The spectro-temporal framework, however, is not the only way to process features in order to reduce errors due to noise in the signal. The two-stage mel-warped Wiener filtering method used in the “Advanced Front End” (AFE), now a standard front end for robust recognition, i...
متن کامل