A simple RNN-plus-highway network for statistical parametric speech synthesis
نویسندگان
چکیده
In this report, we proposes a neural network structure that combines a recurrent neural network (RNN) and a deep highway network. Compared with the highway RNN structures proposed in other studies, the one proposed in this study is simpler since it only concatenates a highway network after a pre-trained RNN. The main idea is to use the ‘iterative unrolled estimation’ of a highway network to finely change the output from the RNN. The experiments on the proposed network structure with a baseline RNN and 7 highway blocks demonstrated that this network performed relatively better than a deep RNN network with a similar mode size. Furthermore, it took less than half the training time of the deep RNN.
منابع مشابه
An investigation of recurrent neural network architectures for statistical parametric speech synthesis
In this paper, we investigate two different recurrent neural network (RNN) architectures: Elman RNN and recently proposed clockwork RNN [1] for statistical parametric speech synthesis (SPSS). Of late, deep neural networks are being used for SPSS which involve predicting every frame independent of the previous predictions, and hence requires post-processing for ensuring smooth evolution of speec...
متن کاملAcoustic Modeling in Statistical Parametric Speech Synthesis – from Hmm to Lstm-rnn
Statistical parametric speech synthesis (SPSS) combines an acoustic model and a vocoder to render speech given a text. Typically decision tree-clustered context-dependent hidden Markov models (HMMs) are employed as the acoustic model, which represent a relationship between linguistic and acoustic features. Recently, artificial neural network-based acoustic models, such as deep neural networks, ...
متن کاملFast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices
Acoustic models based on long short-term memory recurrent neural networks (LSTM-RNNs) were applied to statistical parametric speech synthesis (SPSS) and showed significant improvements in naturalness and latency over those based on hidden Markov models (HMMs). This paper describes further optimizations of LSTM-RNN-based SPSS for deployment on mobile devices; weight quantization, multi-frame inf...
متن کاملMulti-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis
Building text-to-speech (TTS) systems requires large amounts of high quality speech recordings and annotations, which is a challenge to collect especially considering the variation in spoken languages around the world. Acoustic modeling techniques that could utilize inhomogeneous data are hence important as they allow us to pool more data for training. This paper presents a long short-term memo...
متن کاملContextual Representation using Recurrent Neural Network Hidden State for Statistical Parametric Speech Synthesis
In this paper, we propose to use hidden state vector obtained from recurrent neural network (RNN) as a context vector representation for deep neural network (DNN) based statistical parametric speech synthesis. While in a typical DNN based system, there is a hierarchy of text features from phone level to utterance level, they are usually in 1-hot-k encoded representation. Our hypothesis is that,...
متن کامل