speaker transformation

MDL-Based Cluster Number Decision Methods for Speaker Clustering and MLLR Adaptation

2001

Zhipeng Zhang Sadaoki Furui

Speaker clustering is one of the major methods for speaker adaptation. MLLR (Maximum Likelihood Linear Regression) adaptation using transformation matrices corresponding to phone classes/clusters is another useful method especially when the length of utterances for adaptation is limited. In these methods, how to decide the most appropriate number of clusters is an important research issue. This...

متن کامل

Maximum likelihood stochastic transformation adaptation for medium and small data sets

Journal: :Computer Speech & Language 2001

Constantinos Boulis Vassilios Diakoloukas Vassilios Digalakis

Speaker adaptation is recognized as an essential part of today’s large-vocabulary automatic speech recognition systems. A family of techniques that has been extensively applied for limited adaptation data is transformation-based adaptation. In transformation-based adaptation we partition our parameter space in a set of classes, estimate a transform (usually linear) for each class and apply the ...

متن کامل

Voice Morphing Using the Generative Topographic Mapping

2003

Christina ORPHANIDOU Irene M. MOROZ Stephen J. ROBERTS

In this paper we address the problem of Voice Morphing. We attempt to transform the spectral characteristics of a source speakers speech signal so that the listener would believe that the speech was uttered by a target speaker. The voice morphing system transforms the spectral envelope as represented by a Linear Prediction model. The transformation is achieved by codebook mapping using the Gen...

متن کامل

language styles in persian and their literary representation

Journal: :پژوهش ادبیات معاصر جهان 0

علی افخمی دانشگاه تهران سیدضیاءالدین قاسمی دانشگاه تهران

based on how formal a conversation is, how much familiar the speaker and the listeners are with each other and how distant is the speaker's social status from the listener's one, the speaker uses certain forms of language known as styles of language. linguistic styles are different from language varieties. selection of the styles depends on definite social situations and the kind of r...

متن کامل

Speech Recognition Using Dynamical Model of Speech Production

1992

Ken-Ichi Iso

We propose a speech recognition method based on the dynamical model of speech production. The model consists of an articulator and its control command sequences. The latter has linguistic information of speech and the former has the articulatory information which determines transformation from linguistic intentions to speech signals. This separation makes our speech recognition model more contr...

متن کامل

Voice conversion using k-histograms and frame selection

2009

Alejandro José Uriz Pablo Daniel Agüero Antonio Bonafonte Juan Carlos Tulli

The goal of voice conversion systems is to modify the voice of a source speaker to be perceived as if it had been uttered by another specific speaker. Many approaches found in the literature work based on statistical models and introduce an oversmoothing in the target features. Our proposal is a new model that combines several techniques used in unit selection for text-tospeech and a non-gaussi...

متن کامل

Eigenspace-based Linear Transformation Approach for Rapid Speaker Adaptation

2001

Kuan-Ting Chen Hsin-Min Wang

This paper presents our recent effort on the development of the eigenspace-based linear transformation approach for rapid speaker adaptation. The proposed approach toward prior density selection for the MAPLR framework was developed by introducing a priori knowledge analysis on the training speakers via probabilistic principal component analysis (PPCA), so as to construct an eigenspace for spea...

متن کامل

Multimodal Emotion Recognition Based on the Decoupling of Emotion and Speaker Information

2010

Rok Gajsek Vitomir Struc France Mihelic

The standard features used in emotion recognition carry, besides the emotion related information, also cues about the speaker. This is expected, since the nature of emotionally colored speech is similar to the variations in the speech signal, caused by different speakers. Therefore, we present a gradient descent derived transformation for the decoupling of emotion and speaker information contai...

متن کامل

Speaker normalization using cortical strip maps: a neural model for steady-state vowel categorization.

Journal: :The Journal of the Acoustical Society of America 2008

Heather Ames Stephen Grossberg

Auditory signals of speech are speaker dependent, but representations of language meaning are speaker independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech so...

متن کامل

Vtln-based Rapid Cross-lingual Adaptation for Statistical Parametric Speech Synthesis

2012

Lakshmi Saheer Hui Liang John Dines Philip N. Garner

Cross-lingual speaker adaptation (CLSA) has emerged as a new challenge in statistical parametric speech synthesis, with specific application to speech-to-speech translation. Recent research has shown that reasonable speaker similarity can be achieved in CLSA using maximum likelihood linear transformation of model parameters, but this method also has weaknesses due to the inherent mismatch cause...

متن کامل