Text-independent voice conversion using speaker model alignment method from non-parallel speech
نویسندگان
چکیده
In this paper, we propose a novel voice conversion method called speaker model alignment (SMA), which does not require parallel training speech. Firstly, the source and target speaker models, described by Gaussian mixture model (GMM), are trained, respectively. Then, the transformation function of spectral features is learned by aligning the components of source and target speaker models iteratively. Additionally, the transformation function is further combined with GMM, enabling the multiple local mappings, and a local consistent GMM (LCGMM) is also considered for model training to improve the conversion accuracy. Finally, we carry out experiments to evaluate the performance of the proposed method. Objective and subjective experimental results demonstrate that compared with the wellknown INCA approach, the proposed method achieves lower spectral distortions and higher correlations, and obtains a significant improvement in perceptual quality and similarity.
منابع مشابه
طراحی یک روش آموزش ناموازی جدید برای تبدیل گفتار با عملکردی بهتر از آموزش موازی
Introduction: The art of voice mimicking by computers, has with the computer have been one of the most challenging topics of speech processing in recent years. The system of voice conversion has two sides. In one side, the speaker is the source that his or her voice has been changed for mimicking the target speaker’s voice (which is on the other side). Two methods of p...
متن کاملSpeaker-adaptive-trainable Boltzmann machine and its application to non-parallel voice conversion
In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. Voice conversion is a technique where only speaker-specific information in the source speech is converted while keeping the phonological information unchanged. Most of the existing VC methods rely on parallel data—pairs of speech data from the source and target speakers utterin...
متن کاملVoice Conversion Using Articulatory Features
The aim of voice conversion is to transform an utterance spoken by an arbitrary (source) speaker to that of a specific (target) speaker. Text-to-speech (TTS), speech-to-speech translation, mimicry generation and human-machine interaction systems are among the numerous applications which can be greatly benefited by having a voice conversion module. Generally voice conversion systems require para...
متن کاملText and speaker independent voice conversion
This paper describes an approach to the challenging problem of text and speaker independent voice conversion. The approach is based on target speaker’s speech production process parameterization using harmonic analysis. Unified model allows processing of any input speech regardless of its content and source speaker. The method provides subjective quality of conversion that is comparable with te...
متن کاملText-independent cross-language voice conversion
So far, cross-language voice conversion requires at least one bilingual speaker and parallel speech data to perform the training. This paper shows how these obstacles can be overcome by means of a recently presented text-independent training method based on unit selection. The new method is evaluated in the framework of the European speech-to-speech translation project TC-Star and achieves a pe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014