A segmental mixture model for speaker recognition
نویسندگان
چکیده
Standard Gaussian mixture modelling does not possess time sequence information (TSI) other than that which might be embedded in the acoustic features. Dynamic time warping relates directly to TSI, time-warping two sequences of features into alignment. Here, a hybrid system embedding DTW into a GMM is presented. Improved automatic speaker verification performance is demonstrated. Testing 1000 speakers in a fully text independent, world-model-adapted mode shows an equal error improvement over a standard GMM from 4.1% to 3.8%.
منابع مشابه
Combining Gaussian Mixture Models and Segmental Feature Models for Speaker Recognition
In most speaker recognition systems speech utterances are not constrained in content or language. In a text-dependent speaker recognition system lexical content of speech and language are known in advance. The goal of this paper is to show that this information can be used by a segmental features (SF) approach to improve a standard Gaussian mixture model with MFCC features (GMM-MFCC). Speech fe...
متن کاملSubsegmental, Segmental and Suprasegmental Features for Speaker Recognition Using Gaussian Mixture Model
In the feature extraction stage, features representing speaker information are extracted from the speech signal. In the present study LP residual derived from the speech data is used for training and testing and also processing of LP residual in time domain at subsegmental, segmental and suprasegmental levels. In the training phase, GMMs are built, one for each speaker, using the training data ...
متن کاملRecognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model
Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....
متن کاملData-Model Relationship in Text-Independent Speaker Recognition
Text-independent speaker recognition systems such as those based on Gaussian mixture models (GMMs) do not include time sequence information (TSI) within the model itself. The level of importance of TSI in speaker recognition is an interesting question and one addressed in this paper. Recent works has shown that the utilisation of higher-level information such as idiolect, pronunciation, and pro...
متن کاملHidden Markov Model for Speech Recognition
In this paper, a theoretical framework for Bayesian adaptive training of the parameters of discrete hidden Markov model (DHMM) and of semi-continuous HMM (SCHMM) with Gaussian mixture state observation densities is presented. In addition to formulating the forward-backward MAP (maximum a posterion’) and the segmental MAP algorithms for estimating the above HMM parameters, a computationally effi...
متن کامل