Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis

نویسندگان

چکیده

While generative adversarial networks (GANs) based neural text-to-speech (TTS) systems have shown significant improvement in speech synthesis, there is no TTS system to learn synthesize from text sequences with only feedback. Because feedback alone not sufficient train the generator, current models still require reconstruction loss compared ground-truth and generated mel-spectrogram directly. In this paper, we present Multi-SpectroGAN (MSG), which can multi-speaker model by conditioning a self-supervised hidden representation of generator conditional discriminator. This leads better guidance for training. Moreover, also propose style combination (ASC) generalization unseen speaking transcript, latent representations combined embedding multiple mel-spectrograms. Trained ASC feature matching, MSG synthesizes high-diversity controlling mixing individual styles (e.g., duration, pitch, energy). The result shows that high-fidelity mel-spectrogram, has almost same naturalness MOS score as mel-spectrogram.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

The recently-developed WaveNet architecture [27] is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today’s massively parallel computers, and therefore hard to deploy in a rea...

متن کامل

High-Quality Facial Photo-Sketch Synthesis Using Multi-Adversarial Networks

Synthesizing face sketches from real photos and its inverse are well studied problems and they have many applications in digital forensics and entertainment. However, photo/sketch synthesis remains a challenging problem due to the fact that photo and sketch have different characteristics. In this work, we consider this task as an image-to-image translation problem and explore the recently popul...

متن کامل

High fidelity TNA synthesis by Therminator polymerase

Therminator DNA polymerase is an efficient DNA-dependent TNA polymerase capable of polymerizing TNA oligomers of at least 80 nt in length. In order for Therminator to be useful for the in vitro selection of functional TNA sequences, its TNA synthesis fidelity must be high enough to preserve successful sequences. We used sequencing to examine the fidelity of Therminator-catalyzed TNA synthesis a...

متن کامل

Learning a Generative Adversarial Network for High Resolution Artwork Synthesis

Artwork is a mode of creative expression and this paper is particularly interested in investigating if machine can learn and synthetically create artwork that are usually nonfigurative and structured abstract. To this end, we propose an extension to the Generative Adversarial Network (GAN), namely as the ArtGAN to synthetically generate high quality artwork. This is in contrast to most of the c...

متن کامل

Generation of High Fidelity Graphics for Simulator Environments

In the closing speech of DSC 2004 (Paris),it was remarked that, "No child would be impressed by a driving simulator because they are too used to the impressive graphics presented to them by the games industry." This is because the gaming industry is competitive about getting games with attractive graphics for people to buy; to this achieve this aim the market spends millions. Although aesthetic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i14.17559