End-to-end DNN Based Speaker Recognition Inspired by i-vector and PLDA

نویسندگان

  • Johan Rohdin
  • Anna Silnova
  • Mireia Díez
  • Oldrich Plchot
  • Pavel Matejka
  • Lukás Burget
چکیده

Recently several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work, we develop an end-to-end speaker verification system that is initialized to mimic an i-vector + PLDA baseline. The system is then further trained in an end-to-end manner but regularized so that it does not deviate too far from the initial system. In this way we mitigate overfitting which normally limits the performance of endto-end systems. The proposed system outperforms the i-vector + PLDA baseline on both long and short duration utterances.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DNN-based Discriminative Scoring for Speaker Recognition Based on i-vector

Correspondence: [email protected] Center for Speech and Language Technologies, Tsinghua University, ROOM 4-416, Information Sci & Tech Building, Tsinghua University, 100084 Beijing, China Full list of author information is available at the end of the article Abstract One of the state-of-the-art approaches to speaker recognition is based on factor analysis, especially the i-vector model. By...

متن کامل

Joint Training of Expanded End-to-End DNN for Text-Dependent Speaker Verification

We propose an expanded end-to-end DNN architecture for speaker verification based on b-vectors as well as d-vectors. We embedded the components of a speaker verification system such as modeling frame-level features, extracting utterance-level features, dimensionality reduction of utterancelevel features, and trial-level scoring in an expanded end-toend DNN architecture. The main contribution of...

متن کامل

LIA system description for NIST SRE 2016

This paper describes the LIA speaker recognition system developed for the Speaker Recognition Evaluation (SRE) campaign. Eight sub-systems are developed, all based on a state-of-the-art approach: i-vector/PLDA which represents the mainstream technique in text-independent speaker recognition. These sub-systems differ: on the acoustic feature extraction front-end (MFCC, PLP), at the i-vector extr...

متن کامل

Speakers In The Wild (SITW): The QUT Speaker Recognition System

This paper presents the QUT speaker recognition system, as a competing system in the Speakers In The Wild (SITW) speaker recognition challenge. Our proposed system achieved an overall ranking of second place, in the main core-core condition evaluations of the SITW challenge. This system uses an ivector/PLDA approach, with domain adaptation and a deep neural network (DNN) trained to provide feat...

متن کامل

Duration Mismatch Compensation Using Four-Covariance Model and Deep Neural Network for Speaker Verification

Duration mismatch between enrollment and test utterances still remains a major concern for reliability of real-life speaker recognition applications. Two approaches are proposed here to deal with this case when using the i-vector representation. The first one is an adaptation of Gaussian Probabilistic Linear Discriminant Analysis (PLDA) modeling, which can be extended to the case of any shift b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1710.02369  شماره 

صفحات  -

تاریخ انتشار 2017