Parallel Gated Recurrent Unit Networks as an Encoder for Speech Recognition
نویسندگان
چکیده
Listen, Attend and Spell (LAS) network is one of the end-to-end approaches for speech recognition, which does not require an explicit language model. It consists two parts; encoder part receives acoustic features as inputs, decoder produces character at a time step, based on output attention mechanism. Multi-layer recurrent neural networks (RNN) are used in both parts. Hence, LAS architecture can be simplified RNN decoder, another encoder. Their shapes layer sizes different. In this work, we examined performance using multi RNNs part. Our baseline uses with hidden size 256. We 2 4 128 64 each case. The main idea behind proposed approach to focus different patterns (phonemes case) data. At encoder, their outputs concatenated fed decoder. TIMIT database compare mentioned networks, phoneme error rate metric. experimental results showed that achieve better than network. However, increasing number guarantee further improvements.
منابع مشابه
Minimal Gated Unit for Recurrent Neural Networks
Recently recurrent neural networks (RNN) has been very successful in handling sequence data. However, understanding RNN and finding the best practices for RNN is a difficult task, partly because there are many competing and complex hidden units (such as LSTM and GRU). We propose a gated unit for RNN, named as Minimal Gated Unit (MGU), since it only contains one gate, which is a minimal design a...
متن کاملGated Recurrent Unit (GRU) for Emotion Classification from Noisy Speech
Despite the enormous interest in emotion classification from speech, the impact of noise on emotion classification is not well understood. This is important because, due to the tremendous advancement of the smartphone technology, it can be a powerful medium for speech emotion recognition in the outside laboratory natural environment, which is likely to incorporate background noise in the speech...
متن کاملUnfolded recurrent neural networks for speech recognition
We introduce recurrent neural networks (RNNs) for acoustic modeling which are unfolded in time for a fixed number of time steps. The proposed models are feedforward networks with the property that the unfolded layers which correspond to the recurrent layer have time-shifted inputs and tied weight matrices. Besides the temporal depth due to unfolding, hierarchical processing depth is added by me...
متن کاملRecurrent Deep Stacking Networks for Speech Recognition
This paper presented our work on applying Recurrent Deep Stacking Networks (RDSNs) to Robust Automatic Speech Recognition (ASR) tasks. In the paper, we also proposed a more efficient yet comparable substitute to RDSN, BiPass Stacking Network (BPSN). The main idea of these two models is to add phoneme-level information into acoustic models, transforming an acoustic model to the combination of an...
متن کاملInvestigating gated recurrent neural networks for speech synthesis
Recently, recurrent neural networks (RNNs) as powerful sequence models have re-emerged as a potential acoustic model for statistical parametric speech synthesis (SPSS). The long short-term memory (LSTM) architecture is particularly attractive because it addresses the vanishing gradient problem in standard RNNs, making them easier to train. Although recent studies have demonstrated that LSTMs ca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Europan journal of science and technology
سال: 2022
ISSN: ['2148-2683']
DOI: https://doi.org/10.31590/ejosat.1103714