Robust Multi-Band ASR Using Deep Neural Nets and Spectro-temporal Features

نویسندگان

György Kovács

László Tóth

Tamás Grósz

چکیده

Spectro-temporal feature extraction and multi-band processing were both designed to make the speech recognizers more robust. Although they have been used for a long time now, very few attempts have been made to combine them. This is why here we integrate two spectrotemporal feature extraction methods into a multi-band framework. We assess the performance of our spectro-temporal feature sets both individually (as a baseline) and in combination with multi-band processing in phone recognition tasks on clean and noise contaminated versions of the TIMIT dataset. Our results show that multi-band processing clearly outperforms the baseline feature recombination method in every case tested. This improved performance can also be further enhanced by using the recently introduced technology of deep neural nets (DNNs).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Why do ASR Systems Despite Neural Nets Still Depend on Robust Features

To which extent can neural nets learn traditional signal processing stages of current robust ASR front-ends? Will neural nets replace the classical, often auditory-inspired feature extraction in the near future? To answer these questions, a DNN-based ASR system was trained and tested on the Aurora4 robust ASR task using various (intermediate) processing stages. Additionally, the training set wa...

متن کامل

Normalization of spectro-temporal Gabor filter bank features for improved robust automatic speech recognition systems

Physiologically motivated feature extraction methods based on 2D-Gabor filters have already been used successfully in robust automatic speech recognition (ASR) systems. Recently it was shown that a Mel Frequency Cepstral Coefficients (MFCC) baseline can be improved with physiologically motivated features extracted by a 2D-Gabor filter bank (GBFB). Besides physiologically inspired approaches to ...

متن کامل

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...

متن کامل

Joint Optimization of Spectro-Temporal Features and Deep Neural Nets for Robust Automatic Speech Recognition

In speech recognition, feature extraction and acoustical model training are traditionally done in two separate steps. Here, instead, we use a framework that combines spectro-temporal feature extraction and the training of neural network based acoustic models into a single process. We found earlier that this approach can be successfully applied for the recognition of speech. In this paper, we pr...

متن کامل

Improved Automatic Speech Recognition Using Subband Temporal Envelope Features and Time-Delay Neural Network Denoising Autoencoder

This paper investigates the use of perceptually-motivated subband temporal envelope (STE) features and time-delay neural network (TDNN) denoising autoencoder (DAE) to improve deep neural network (DNN)-based automatic speech recognition (ASR). STEs are estimated by full-wave rectification and low-pass filtering of band-passed speech using a Gammatone filter-bank. TDNNs are used either as DAE or ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Robust Multi-Band ASR Using Deep Neural Nets and Spectro-temporal Features

نویسندگان

چکیده

منابع مشابه

Why do ASR Systems Despite Neural Nets Still Depend on Robust Features

Normalization of spectro-temporal Gabor filter bank features for improved robust automatic speech recognition systems

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

Joint Optimization of Spectro-Temporal Features and Deep Neural Nets for Robust Automatic Speech Recognition

Improved Automatic Speech Recognition Using Subband Temporal Envelope Features and Time-Delay Neural Network Denoising Autoencoder

عنوان ژورنال:

اشتراک گذاری