Telephone-based Text-dependent Speaker Verification
نویسندگان
چکیده
TELEPHO E-BASED TEXT-DEPE DE T SPEAKER VERIFICATIO In this thesis, we investigate model selection and channel variability issues on telephone-based text-dependent speaker verification applications. Due to the lack of an appropriate database for the task, we collected two multi-channel speaker recognition databases which are referred to as text-dependent variable text (TDVT-D) and textdependent single utterance (TDSU-D). TDVT-D consists of digit strings and short utterances in Turkish and TDSU-D contains a single Turkish phrase. In the TVDT-D, Gaussian mixture model (GMM) and hidden Markov model (HMM) based methods are compared using several authentication utterances, enrollment scenarios and enrollment-authentication channel conditions. In the experiments, we employ a rankbased decision making procedure. In the second set of experiments, we investigate three channel compensation techniques together with cepstral mean subtraction (CMS): i) LTAS filtering ii) MLLR transformation iii) handset-dependent rank-based decision making (Hrank). In all three methods, a prior knowledge of the employed channel type is required. We recognize the channels with channel GMMs trained for each condition. In this section, we also analyze the influence of channel detection errors on the verification performance. In the TDSU-D, phonetic HMM, sentence HMM and GMM based methods are compared for the single utterance task. In order to compensate for channel mismatch conditions, we implement test normalization (T-norm), zero normalization (Z-norm) and combined (i.e., TZ-norm and ZT-norm) score normalization techniques. We also propose a novel combination procedure referred to as C-norm. Additionally, we benefit from the prior knowledge of handset-channel type in order to improve the verification performance. A cohort-based channel detection method is introduced in addition to the classical GMMbased method. After the score normalization section, feature domain spectral mean division (SMD) method is presented as an alternative to the well-known CMS. In the last set of experiments, prosodic (energy, pitch, duration) and spectral features are combined together in the sentence HMM framework.
منابع مشابه
Speaker verification with limited enrollment data
New methods for speaker veri cation that address the problems of limited training data and unknown telephone channel are presented. We describe a system for studying the feasibility of telephone based voice signatures for electronic documents that uses speaker veri cation with a xed test phrase but very limited data for training speaker models. We examine three methods for speaker veri cation t...
متن کاملUnsupervised intra-speaker variability compensation based on Gestalt and model adaptation in speaker verification with telephone speech
In this paper an unsupervised compensation method based on Gestalt, ISVC, is proposed to address the problem of limited enrolling data and noise robustness in text-dependent speaker verification (SV). Reductions in EER and in the integral below the ROC curve as high as 20% or 40% and 30% or 60%, respectively, can be achieved by ISVC independently of the number of enrolling utterances. In contra...
متن کاملExperiments with speaker verification over the telephone
In this paper we present a study on speaker verification showing achievable performance levels for both high quality speech and telephone speech and for two operational modes, i.e. textdependent and text-independent speaker verification. A statistical modeling approach is taken, where for text independent verification the talker is viewed as a source of phones, modeled by a fully connected Mark...
متن کاملText-independent Speaker Verification Based on Probabilistic Neural Networks
In this paper, a text-independent Probabilistic Neural Network (PNN)-based Speaker Verification system is presented. Modular structure with a distinct PNN for each enrolled speaker is used. A gender-dependent universal background model is built to represent the impostor speakers. A detailed description of the system, as well as the time required for training and processing all the test trials i...
متن کاملA comparative evaluation of variance flooring techniques in HMM-based speaker verification
The problem of how to train variance parameters on scarce data is addressed in the context of text-dependent, HMM-based, automatic speaker verification. Three variations of variance flooring is explored as a means to prevent over-fitting. With the best performing one, the floor to a variance vector of a client model is proportional to the corresponding variance vector in a non-client multi-spea...
متن کامل