Discriminative Methods for Noise Robust Speech Recognition: a Chime Challenge Benchmark

نویسندگان

  • Yuuki Tachioka
  • Shinji Watanabe
  • Jonathan Le Roux
  • John R. Hershey
چکیده

The recently introduced second CHiME challenge is a difficult two-microphone speech recognition task with non-stationary interference. Current approaches in the source-separation community have focused on the front-end problem of estimating the clean signal given the noisy signals. Here we pursue a different approach, focusing on state-of-the-art ASR techniques such as discriminative training and various feature transformations, in addition to simple noise suppression methods based on prior-based binary masking with estimated angle of arrival. In addition, we propose an augmented discriminative feature transformation that can introduce arbitrary features to a discriminative feature transform, an efficient combination method of Discriminative Language Modeling (DLM) and Minimum Bayes Risk (MBR) decoding in an ASR post-processing stage, and preliminarily investigate the effectiveness of deep neural networks for reverberated and noisy speech recognition. Using these techniques we present a benchmark on the middle-vocabulary subtask of CHiME challenge, showing their effectiveness for this task. Promising results were also obtained for the proposed augmented feature transformation and combination of DLM and MBR decoding. A part of the training code has been released as an advanced ASR baseline, using the Kaldi speech recognition toolkit.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discriminative Methods for Noise Robust Speech Reacognition: A CHiME Challenge Benchmark

The recently introduced second CHiME challenge is a difficult twomicrophone speech recognition task with non-stationary interference. Current approaches in the source-separation community have focused on the front-end problem of estimating the clean signal given the noisy signals. Here we pursue a different approach, focusing on state-of-the-art ASR techniques such as discriminative training an...

متن کامل

Noise Robust Missing Data Mask Estimation Based on Automatically Learned Features

ABSTRACT In this work, we present a missing feature reconstruction based automatic speech recognition (ASR) system in which masks are estimated by binary classification of features generated by GaussianBernoulli restricted Boltzmann machines (GRBMs). The system is evaluated on Track 1 of the 2nd CHiME challenge data. Overall, the best performance is achieved when the reconstructed speech featur...

متن کامل

Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend

This paper gives an in-depth presentation of the multi-microphone speech recognition system we submitted to the 3rd CHiME speech separation and recognition challenge (CHiME-3) and its extension. The proposed system takes advantage of recurrent neural networks (RNNs) throughout the model from the front-end speech enhancement to the language modeling. Three different types of beamforming are used...

متن کامل

Noise-Robust ASR for the third 'CHiME' Challenge Exploiting Time-Frequency Masking based Multi-Channel Speech Enhancement and Recurrent Neural Network

In this paper, the Lingban entry to the third ‘CHiME’ speech separation and recognition challenge is presented. A timefrequency masking based speech enhancement front-end is proposed to suppress the environmental noise utilizing multichannel coherence and spatial cues. The state-of-the-art speech recognition techniques, namely recurrent neural network based acoustic and language modeling, state...

متن کامل

Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments

This article investigates speech feature enhancement based on deep bidirectional recurrent neural networks. The Long Short-Term Memory (LSTM) architecture is used to exploit a self-learnt amount of temporal context in learning the correspondences of noisy and reverberant with undistorted speech features. The resulting networks are applied to feature enhancement in the context of the 2013 2nd Co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013