Wavenet

Fast Wavenet Generation Algorithm

Journal: :CoRR 2016

Tom Le Paine Pooya Khorrami Shiyu Chang Yang Zhang Prajit Ramachandran Mark A. Hasegawa-Johnson Thomas S. Huang

This paper presents an efficient implementation of the Wavenet generation process called Fast Wavenet. Compared to a naı̈ve implementation that has complexity O(2) (L denotes the number of layers in the network), our proposed approach removes redundant convolution operations by caching previous calculations, thereby reducing the complexity to O(L) time. Timing experiments show significant advant...

متن کامل

Speaker-Dependent WaveNet Vocoder

2017

Akira Tamamori Tomoki Hayashi Kazuhiro Kobayashi Kazuya Takeda Tomoki Toda

In this study, we propose a speaker-dependent WaveNet vocoder, a method of synthesizing speech waveforms with WaveNet, by utilizing acoustic features from existing vocoder as auxiliary features of WaveNet. It is expected that WaveNet can learn a sample-by-sample correspondence between speech waveform and acoustic features. The advantage of the proposed method is that it does not require (1) exp...

متن کامل

Algorithmic composition of polyphonic music with the WaveCRF

2017

Umut Güçlü Yağmur Güçlütürk Luca Ambrogioni Eric Maris Rob van Lier Marcel van Gerven

Here, we propose a new approach for modeling conditional probability distributions of polyphonic music by combining WaveNET and CRF-RNN variants, and show that this approach beats LSTM and WaveNET baselines that do not take into account the statistical dependencies between simultaneous notes.

متن کامل

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

Journal: :CoRR 2017

Aäron van den Oord Yazhe Li Igor Babuschkin Karen Simonyan Oriol Vinyals Koray Kavukcuoglu George van den Driessche Edward Lockhart Luis C. Cobo Florian Stimberg Norman Casagrande Dominik Grewe Seb Noury Sander Dieleman Erich Elsen Nal Kalchbrenner Heiga Zen Alex Graves Helen King Tom Walters Dan Belov Demis Hassabis

The recently-developed WaveNet architecture [27] is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today’s massively parallel computers, and therefore hard to deploy in a rea...

متن کامل

Do WaveNets Dream of Acoustic Waves?

Journal: :CoRR 2018

Kanru Hua

Various sources have reported the WaveNet deep learning architecture being able to generate high-quality speech, but to our knowledge there haven’t been studies on the interpretation or visualization of trained WaveNets. This study investigates the possibility that WaveNet understands speech by unsupervisedly learning an acoustically meaningful latent representation of the speech signals in its...

متن کامل

Fftnet: a Real-time Speaker-dependent Neural Vocoder

2018

Zeyu Jin Adam Finkelstein Gautham J. Mysore Jingwan Lu

We introduce FFTNet, a deep learning approach synthesizing audio waveforms. Our approach builds on the recent WaveNet project, which showed that it was possible to synthesize a natural sounding audio waveform directly from a deep convolutional neural network. FFTNet offers two improvements over WaveNet. First it is substantially faster, allowing for real-time synthesis of audio waveforms. Secon...

متن کامل

Using the Wavenet for function approximation

1997

Alexander Ypma Robert P. W. Duin

When the aim is to make an arbitrary nonlin-ear mapping, neural networks are known to be a suitable technique. The Wavenet combines them with the wavelet transform, enabling a multi-scale approximation, while dilation and translation parameters can be t to the data. Some properties of the Wavenet are investigated and an outlook to application in machinery monitoring is provided.

متن کامل

Statistical Voice Conversion with WaveNet-Based Waveform Generation

2017

Kazuhiro Kobayashi Tomoki Hayashi Akira Tamamori Tomoki Toda

This paper presents a statistical voice conversion (VC) technique with the WaveNet-based waveform generation. VC based on a Gaussian mixture model (GMM) makes it possible to convert the speaker identity of a source speaker into that of a target speaker. However, in the conventional vocoding process, various factors such as F0 extraction errors, parameterization errors and over-smoothing effects...

متن کامل

Wavelet Neural Network Algorithms with Applications in Approximation Signals

2011

Carlos Roberto Domínguez Mayorga María Angélica Espejel Rivera Luis Enrique Ramos Velasco Julio César Ramos Fernández Enrique Escamilla Hernández

In this paper we present algorithms which are adaptive and based on neural networks and wavelet series to build wavenets function approximators. Results are shown in numerical simulation of two wavenets approximators architectures: the first is based on a wavenet for approach the signals under study where the parameters of the neural network are adjusted online, the other uses a scheme approxim...

متن کامل

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Journal: :CoRR 2017

Jonathan Shen Ruoming Pang Ron J. Weiss Mike Schuster Navdeep Jaitly Zongheng Yang Zhifeng Chen Yu Zhang Yuxuan Wang R. J. Skerry-Ryan Rif A. Saurous Yannis Agiomyrgiannakis Yonghui Wu

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms. Our model achieves a mean opinio...

متن کامل