Study on the effects of intrinsic variation using i-vectors in text-independent speaker verification

نویسندگان

  • Sheng Chen
  • Mingxing Xu
  • Emlyn Pratt
چکیده

Speaker verification performance is adversely affected by mismatches between training and testing data in intrinsic variations. This paper explores how recent technologies focused on modeling the total variability behave in addressing the effects of intrinsic variation in speaker verification. The effects of intrinsic variation are investigated from six aspects including speaking style, speaking rate, speaking volume, emotional state, physical status, and speaking language. The speaker and session variability are modeled with the i-vector framework in the total variability space and the cosine similarity is used as the final decision score in the i-vector based speaker verification system. Intrinsic variations are compensated in the i-vector framework with a variety of techniques, specifically Linear Discriminant Analysis (LDA), Within-Class Covariance Normalization (WCCN) and Nuisance Attribute Projection (NAP). Experiments in the intrinsic corpus show that speaker volume has dramatic effects on the results of speaker verification systems and whisper speech brings the largest degradation of speaker verification performance. The best results are obtained by i-vector modeling with the combined compensation of LDA andWCCN in the i-vector based systems. Compared to the GMM-UBM based system, around 36.76% relative improvement in Equal Error Rate (EER) is obtained in the i-Vector+LDA+WCCN system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CNN-Based Joint Mapping of Short and Long Utterance i-Vectors for Speaker Verification Using Short Utterances

Text-independent speaker recognition using short utterances is a highly challenging task due to the large variation and content mismatch between short utterances. I-vector and probabilistic linear discriminant analysis (PLDA) based systems have become the standard in speaker verification applications, but they are less effective with short utterances. To address this issue, we propose a novel m...

متن کامل

Deep Speaker Vectors for Semi Text-independent Speaker Verification

Recent research shows that deep neural networks (DNNs) can be used to extract deep speaker vectors (d-vectors) that preserve speaker characteristics and can be used in speaker verification. This new method has been tested on text-dependent speaker verification tasks, and improvement was reported when combined with the conventional i-vector method. This paper extends the d-vector approach to sem...

متن کامل

DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances

We investigate how to improve the performance of DNN ivector based speaker verification for short, text-constrained test utterances, e.g. connected digit strings. A text-constrained verification, due to its smaller, limited vocabulary, can deliver better performance than a text-independent one for a short utterance. We study the problem with “phonetically aware” Deep Neural Net (DNN) in its cap...

متن کامل

Effects of vocal effort and speaking style on text-independent speaker verification

We study the question of how intrinsic variations (associated with the speaker rather than the recording environment) affect text-independent speaker verification performance. Experiments using the SRI-FRTIV corpus, which systematically varies both vocal effort and speaking style, reveal that (1) “furtive” speech poses a significant challenge; (2) conversations and interviews, despite stylistic...

متن کامل

Deep Neural Network Embeddings for Text-Independent Speaker Verification

This paper investigates replacing i-vectors for text-independent speaker verification with embeddings extracted from a feedforward deep neural network. Long-term speaker characteristics are captured in the network by a temporal pooling layer that aggregates over the input speech. This enables the network to be trained to discriminate between speakers from variablelength speech segments. After t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012