Many-to-Many Voice Transformer Network

نویسندگان

چکیده

This paper proposes a voice conversion (VC) method based on sequence-to-sequence (S2S) learning framework, which enables simultaneous of the characteristics, pitch contour, and duration input speech. We previously proposed an S2S-based VC using transformer network architecture called (VTN). The original VTN was designed to learn only mapping speech feature sequences from one speaker another. Here, main idea we propose is extension that can simultaneously mappings among multiple speakers. extension, many-to-many VTN, us fully use available training data collected speakers by capturing common latent features be shared across different It also allows introduce loss identity ensure sequence will remain unchanged when source target indices are same. Using this particular for model has been found extremely effective in improving performance at test time. conducted experiments our obtained higher sound quality similarity than baseline methods. model, with slight modification its architecture, handle any-to-many tasks reasonably well.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Many-to-many eigenvoice conversion with reference voice

In this paper, we propose many-to-many voice conversion (VC) techniques to convert an arbitrary source speaker’s voice into an arbitrary target speaker’s voice. We have proposed one-tomany eigenvoice conversion (EVC) and many-to-one EVC. In the EVC, an eigenvoice Gaussian mixture model (EV-GMM) is trained in advance using multiple parallel data sets of a reference speaker and many pre-stored sp...

متن کامل

Many-to-many voice conversion based on multiple non-negative matrix factorization

We present in this paper an exemplar-based Voice Conversion (VC) method using Non-negative Matrix Factorization (NMF), which is different from conventional statistical VC. NMF-based VC has advantages of noise robustness and naturalness of converted voice compared to Gaussian Mixture Model (GMM)based VC. However, because NMF-based VC is based on parallel training data of source and target speake...

متن کامل

Evaluation of a singing voice conversion method based on many-to-many eigenvoice conversion

In this paper, we evaluate our proposed singing voice conversion method from various perspectives. To enable singers to freely control their voice timbre of singing voice, we have proposed a singing voice conversion method based on many-tomany eigenvoice conversion (EVC) that enables to convert the voice timbre of an arbitrary source singer into that of another arbitrary target singer using a p...

متن کامل

Linear transformation approaches to many-to-one voice conversion

In this paper, we present linear transformation algorithms for many to one voice conversion (VC). Many to one VC is a tech nique for converting an arbitrary source speaker’s voice into the target speaker’s voice. A conversion model previously devel oped between many prestored source speakers and the target speaker is adapted into a new source speaker in an unsuper vised manner. In this study, w...

متن کامل

Many-to-Many Graph Matching

Postcode 06560 City Sogutozu State Ankara Country Turkey Author Degree Dr.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2021

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2020.3047262