Parallel Gated Recurrent Unit Networks as an Encoder for Speech Recognition

نویسندگان

چکیده

Listen, Attend and Spell (LAS) network is one of the end-to-end approaches for speech recognition, which does not require an explicit language model. It consists two parts; encoder part receives acoustic features as inputs, decoder produces character at a time step, based on output attention mechanism. Multi-layer recurrent neural networks (RNN) are used in both parts. Hence, LAS architecture can be simplified RNN decoder, another encoder. Their shapes layer sizes different. In this work, we examined performance using multi RNNs part. Our baseline uses with hidden size 256. We 2 4 128 64 each case. The main idea behind proposed approach to focus different patterns (phonemes case) data. At encoder, their outputs concatenated fed decoder. TIMIT database compare mentioned networks, phoneme error rate metric. experimental results showed that achieve better than network. However, increasing number guarantee further improvements.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimal Gated Unit for Recurrent Neural Networks

Recently recurrent neural networks (RNN) has been very successful in handling sequence data. However, understanding RNN and finding the best practices for RNN is a difficult task, partly because there are many competing and complex hidden units (such as LSTM and GRU). We propose a gated unit for RNN, named as Minimal Gated Unit (MGU), since it only contains one gate, which is a minimal design a...

متن کامل

Gated Recurrent Unit (GRU) for Emotion Classification from Noisy Speech

Despite the enormous interest in emotion classification from speech, the impact of noise on emotion classification is not well understood. This is important because, due to the tremendous advancement of the smartphone technology, it can be a powerful medium for speech emotion recognition in the outside laboratory natural environment, which is likely to incorporate background noise in the speech...

متن کامل

Unfolded recurrent neural networks for speech recognition

We introduce recurrent neural networks (RNNs) for acoustic modeling which are unfolded in time for a fixed number of time steps. The proposed models are feedforward networks with the property that the unfolded layers which correspond to the recurrent layer have time-shifted inputs and tied weight matrices. Besides the temporal depth due to unfolding, hierarchical processing depth is added by me...

متن کامل

Recurrent Deep Stacking Networks for Speech Recognition

This paper presented our work on applying Recurrent Deep Stacking Networks (RDSNs) to Robust Automatic Speech Recognition (ASR) tasks. In the paper, we also proposed a more efficient yet comparable substitute to RDSN, BiPass Stacking Network (BPSN). The main idea of these two models is to add phoneme-level information into acoustic models, transforming an acoustic model to the combination of an...

متن کامل

Investigating gated recurrent neural networks for speech synthesis

Recently, recurrent neural networks (RNNs) as powerful sequence models have re-emerged as a potential acoustic model for statistical parametric speech synthesis (SPSS). The long short-term memory (LSTM) architecture is particularly attractive because it addresses the vanishing gradient problem in standard RNNs, making them easier to train. Although recent studies have demonstrated that LSTMs ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Europan journal of science and technology

سال: 2022

ISSN: ['2148-2683']

DOI: https://doi.org/10.31590/ejosat.1103714