Robust Multi-Band ASR Using Deep Neural Nets and Spectro-temporal Features
نویسندگان
چکیده
Spectro-temporal feature extraction and multi-band processing were both designed to make the speech recognizers more robust. Although they have been used for a long time now, very few attempts have been made to combine them. This is why here we integrate two spectrotemporal feature extraction methods into a multi-band framework. We assess the performance of our spectro-temporal feature sets both individually (as a baseline) and in combination with multi-band processing in phone recognition tasks on clean and noise contaminated versions of the TIMIT dataset. Our results show that multi-band processing clearly outperforms the baseline feature recombination method in every case tested. This improved performance can also be further enhanced by using the recently introduced technology of deep neural nets (DNNs).
منابع مشابه
Why do ASR Systems Despite Neural Nets Still Depend on Robust Features
To which extent can neural nets learn traditional signal processing stages of current robust ASR front-ends? Will neural nets replace the classical, often auditory-inspired feature extraction in the near future? To answer these questions, a DNN-based ASR system was trained and tested on the Aurora4 robust ASR task using various (intermediate) processing stages. Additionally, the training set wa...
متن کاملNormalization of spectro-temporal Gabor filter bank features for improved robust automatic speech recognition systems
Physiologically motivated feature extraction methods based on 2D-Gabor filters have already been used successfully in robust automatic speech recognition (ASR) systems. Recently it was shown that a Mel Frequency Cepstral Coefficients (MFCC) baseline can be improved with physiologically motivated features extracted by a 2D-Gabor filter bank (GBFB). Besides physiologically inspired approaches to ...
متن کاملPhoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملJoint Optimization of Spectro-Temporal Features and Deep Neural Nets for Robust Automatic Speech Recognition
In speech recognition, feature extraction and acoustical model training are traditionally done in two separate steps. Here, instead, we use a framework that combines spectro-temporal feature extraction and the training of neural network based acoustic models into a single process. We found earlier that this approach can be successfully applied for the recognition of speech. In this paper, we pr...
متن کاملImproved Automatic Speech Recognition Using Subband Temporal Envelope Features and Time-Delay Neural Network Denoising Autoencoder
This paper investigates the use of perceptually-motivated subband temporal envelope (STE) features and time-delay neural network (TDNN) denoising autoencoder (DAE) to improve deep neural network (DNN)-based automatic speech recognition (ASR). STEs are estimated by full-wave rectification and low-pass filtering of band-passed speech using a Gammatone filter-bank. TDNNs are used either as DAE or ...
متن کامل