Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis
نویسندگان
چکیده
While generative adversarial networks (GANs) based neural text-to-speech (TTS) systems have shown significant improvement in speech synthesis, there is no TTS system to learn synthesize from text sequences with only feedback. Because feedback alone not sufficient train the generator, current models still require reconstruction loss compared ground-truth and generated mel-spectrogram directly. In this paper, we present Multi-SpectroGAN (MSG), which can multi-speaker model by conditioning a self-supervised hidden representation of generator conditional discriminator. This leads better guidance for training. Moreover, also propose style combination (ASC) generalization unseen speaking transcript, latent representations combined embedding multiple mel-spectrograms. Trained ASC feature matching, MSG synthesizes high-diversity controlling mixing individual styles (e.g., duration, pitch, energy). The result shows that high-fidelity mel-spectrogram, has almost same naturalness MOS score as mel-spectrogram.
منابع مشابه
Parallel WaveNet: Fast High-Fidelity Speech Synthesis
The recently-developed WaveNet architecture [27] is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today’s massively parallel computers, and therefore hard to deploy in a rea...
متن کاملHigh-Quality Facial Photo-Sketch Synthesis Using Multi-Adversarial Networks
Synthesizing face sketches from real photos and its inverse are well studied problems and they have many applications in digital forensics and entertainment. However, photo/sketch synthesis remains a challenging problem due to the fact that photo and sketch have different characteristics. In this work, we consider this task as an image-to-image translation problem and explore the recently popul...
متن کاملHigh fidelity TNA synthesis by Therminator polymerase
Therminator DNA polymerase is an efficient DNA-dependent TNA polymerase capable of polymerizing TNA oligomers of at least 80 nt in length. In order for Therminator to be useful for the in vitro selection of functional TNA sequences, its TNA synthesis fidelity must be high enough to preserve successful sequences. We used sequencing to examine the fidelity of Therminator-catalyzed TNA synthesis a...
متن کاملLearning a Generative Adversarial Network for High Resolution Artwork Synthesis
Artwork is a mode of creative expression and this paper is particularly interested in investigating if machine can learn and synthetically create artwork that are usually nonfigurative and structured abstract. To this end, we propose an extension to the Generative Adversarial Network (GAN), namely as the ArtGAN to synthetically generate high quality artwork. This is in contrast to most of the c...
متن کاملGeneration of High Fidelity Graphics for Simulator Environments
In the closing speech of DSC 2004 (Paris),it was remarked that, "No child would be impressed by a driving simulator because they are too used to the impressive graphics presented to them by the games industry." This is because the gaming industry is competitive about getting games with attractive graphics for people to buy; to this achieve this aim the market spends millions. Although aesthetic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i14.17559