Towards Online-Recognition with Deep Bidirectional LSTM Acoustic Models

نویسندگان

Albert Zeyer

Ralf Schlüter

Hermann Ney

چکیده

Online-Recognition requires the acoustic model to provide posterior probabilities after a limited time delay given the online input audio data. This necessitates unidirectional modeling and the standard solution is to use unidirectional long short-term memory (LSTM) recurrent neural networks (RNN) or feedforward neural networks (FFNN). It is known that bidirectional LSTMs are more powerful and perform better than unidirectional LSTMs. To demonstrate the performance difference, we start by comparing several different bidirectional and unidirectional LSTM topologies. Furthermore, we apply a modification to bidirectional RNNs to enable online-recognition by moving a window over the input stream and perform one forwarding through the RNN on each window. Then, we combine the posteriors of each forwarding and we renormalize them. We show in experiments that the performance of this online-enabled bidirectional LSTM performs as good as the offline bidirectional LSTM and much better than the unidirectional LSTM.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Forward-Backward Convolutional LSTM for Acoustic Modeling

An automatic speech recognition (ASR) performance has greatly improved with the introduction of convolutional neural network (CNN) or long-short term memory (LSTM) for acoustic modeling. Recently, a convolutional LSTM (CLSTM) has been proposed to directly use convolution operation within the LSTM blocks and combine the advantages of both CNN and LSTM structures into a single architecture. Thi...

متن کامل

Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition

Layer normalization is a recently introduced technique for normalizing the activities of neurons in deep neural networks to improve the training speed and stability. In this paper, we introduce a new layer normalization technique called Dynamic Layer Normalization (DLN) for adaptive neural acoustic modeling in speech recognition. By dynamically generating the scaling and shifting parameters in ...

متن کامل

Combining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)

Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...

متن کامل

Fast and accurate recurrent neural network acoustic models for speech recognition

We have recently shown that deep Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) outperform feed forward deep neural networks (DNNs) as acoustic models for speech recognition. More recently, we have shown that the performance of sequence trained context dependent (CD) hidden Markov model (HMM) acoustic models using such LSTM RNNs can be equaled by sequence trained phone models in...

متن کامل

Modeling speaker variability using long short-term memory networks for speech recognition

Speaker adaptation of deep neural networks (DNNs) based acoustic models is still a challenging area of research. Considering that long short-term memory (LSTM) recurrent neural networks (RNNs) have been successfully applied to many sequence prediction and sequence labeling tasks, we propose to use LSTM RNNs for modeling speaker variability in automatic speech recognition (ASR). Firstly, the LST...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Towards Online-Recognition with Deep Bidirectional LSTM Acoustic Models

نویسندگان

چکیده

منابع مشابه

Forward-Backward Convolutional LSTM for Acoustic Modeling

Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition

Combining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)

Fast and accurate recurrent neural network acoustic models for speech recognition

Modeling speaker variability using long short-term memory networks for speech recognition

عنوان ژورنال:

اشتراک گذاری