End-to-end DNN Based Speaker Recognition Inspired by i-vector and PLDA
نویسندگان
چکیده
Recently several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work, we develop an end-to-end speaker verification system that is initialized to mimic an i-vector + PLDA baseline. The system is then further trained in an end-to-end manner but regularized so that it does not deviate too far from the initial system. In this way we mitigate overfitting which normally limits the performance of endto-end systems. The proposed system outperforms the i-vector + PLDA baseline on both long and short duration utterances.
منابع مشابه
DNN-based Discriminative Scoring for Speaker Recognition Based on i-vector
Correspondence: [email protected] Center for Speech and Language Technologies, Tsinghua University, ROOM 4-416, Information Sci & Tech Building, Tsinghua University, 100084 Beijing, China Full list of author information is available at the end of the article Abstract One of the state-of-the-art approaches to speaker recognition is based on factor analysis, especially the i-vector model. By...
متن کاملJoint Training of Expanded End-to-End DNN for Text-Dependent Speaker Verification
We propose an expanded end-to-end DNN architecture for speaker verification based on b-vectors as well as d-vectors. We embedded the components of a speaker verification system such as modeling frame-level features, extracting utterance-level features, dimensionality reduction of utterancelevel features, and trial-level scoring in an expanded end-toend DNN architecture. The main contribution of...
متن کاملLIA system description for NIST SRE 2016
This paper describes the LIA speaker recognition system developed for the Speaker Recognition Evaluation (SRE) campaign. Eight sub-systems are developed, all based on a state-of-the-art approach: i-vector/PLDA which represents the mainstream technique in text-independent speaker recognition. These sub-systems differ: on the acoustic feature extraction front-end (MFCC, PLP), at the i-vector extr...
متن کاملSpeakers In The Wild (SITW): The QUT Speaker Recognition System
This paper presents the QUT speaker recognition system, as a competing system in the Speakers In The Wild (SITW) speaker recognition challenge. Our proposed system achieved an overall ranking of second place, in the main core-core condition evaluations of the SITW challenge. This system uses an ivector/PLDA approach, with domain adaptation and a deep neural network (DNN) trained to provide feat...
متن کاملDuration Mismatch Compensation Using Four-Covariance Model and Deep Neural Network for Speaker Verification
Duration mismatch between enrollment and test utterances still remains a major concern for reliability of real-life speaker recognition applications. Two approaches are proposed here to deal with this case when using the i-vector representation. The first one is an adaptation of Gaussian Probabilistic Linear Discriminant Analysis (PLDA) modeling, which can be extended to the case of any shift b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.02369 شماره
صفحات -
تاریخ انتشار 2017