Improving Speaker Recognition Performance Using Phonetically Structured Gaussian Mixture Models
نویسندگان
چکیده
Throughout the past few years it has been shown that Gaussian Mixture Models (GMM) are highly suitable for speaker identification and verification. Nevertheless these models try to represent primarily the distribution of the available training data neglecting any possible phonetic information which might be of worth. In our paper we present a recognition system using multiple speaker GMMs based on phonetic classes. By introducing ’phonetic’ mixture coefficients a weighting of phoneme classes with respect to speaker recognizability can be achieved. The implicit integration in the probability computation avoids the need for a phonetic labeling during recognition. The mixture weights can be learned in a training phase. Model training was examined applying MAP enrolment as well as the recently reported Eigenvoice approach. Especially for the latter the phonetic separation has shown to be advantageous. Recognition error reductions up to 15 % relatively were achieved. Furthermore, the multiple GMM approach is particularly effective for speaker enrolment with sparse training data.
منابع مشابه
Speaker recognition by means of acoustic and phonetically informed GMMs
In this work we assess the recently proposed hybrid Deep Neural Network/Gaussian Mixture Model (DNN/GMM) approach for speaker recognition considering the effects of the granularity of the phonetic DNN model, and of the precision of the corresponding GMM models, which will be referred to as the phonetic GMMs. The aim of this work is to better understand the contributions of the phonetic informat...
متن کاملTowards a more efficient SVM supervector speaker verification system using Gaussian reduction and a tree-structured hash
Speaker verification (SV) systems that employ maximum a posteriori (MAP) adaptation of a Gaussian mixture model (GMM) universal background model (UBM) incur a significant teststage computational load in the calculation of a posteriori probabilities and sufficient statistics. We propose a multi-layered hash system employing a tree-structured GMM which uses Runnalls’ GMM reduction technique. The ...
متن کاملStructured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech
Speaker variability is a well-known problem of state-of-theart Automatic Speech Recognition (ASR) systems. In particular, handling children speech is challenging because of substantial differences in pronunciation of the speech units between adult and child speakers. To build accurate ASR systems for all types of speakers Hidden Markov Models with Gaussian Mixture Densities were intensively use...
متن کاملRecognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model
Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....
متن کاملText-Independent Speaker Recognition Using Gaussian Mixture Models Final Term Paper Proposal
The proposed project is an implementation of speaker recognition systems, both identification and verification. The systems are built using Gaussian Mixture Models, as proposed in several papers from Douglas A. Reynolds. The use of Fractional Covariance Matrix is studied as an possible increase for the traditional recognition systems. keywords: speaker recognition; Gaussian Mixture Models; like...
متن کامل