DNN-based SRE Systems in Multi-Language Conditions — Technical Report
نویسندگان
چکیده
This work studies the usage of the (currently state-of-the-art) Deep Neural Networks (DNN) i-vector/PLDA-based speaker recognition systems in multi-language (especially non-English) conditions. On the “Language Pack” of the PRISM set, we evaluate the systems’ performance using NIST’s standard metrics. We study the use of multi-lingual DNN in place of the original English DNN on these multi-language conditions. We show that not only the gain from using DNNs vanishes, but also the DNN-based systems tend to produce de-calibrated scores under the studied conditions. This work gives suggestions for directions of future research rather than any particular solutions.
منابع مشابه
Investigation of bottleneck features and multilingual deep neural networks for speaker verification
Recently, the integration of deep neural networks (DNNs) with i-vector systems is proved to be effective for speaker verification. This method uses the DNN with senone outputs to produce frame alignments for sufficient statistics extraction. However, two types of data mismatch may degrade the performance of the DNN-based speaker verification systems. First, the DNN requires transcribed training...
متن کاملA deep neural network speaker verification system targeting microphone speech
We recently proposed the use of deep neural networks (DNN) in place of Gaussian Mixture models (GMM) in the i-vector extraction process for speaker recognition. We have shown significant accuracy improvements on the 2012 NIST speaker recognition evaluation (SRE) telephone conditions. This paper explores how this framework can be effectively used on the microphone speech conditions of the 2012 N...
متن کاملImproving Deep Neural Networks Based Speaker Verification Using Unlabeled Data
Recently, deep neural networks (DNNs) trained to predict senones have been incorporated into the conventional i-vector based speaker verification systems to provide soft frame alignments and show promising results. However, the data mismatch problem may degrade the performance since the DNN requires transcribed data (out-domain data) while the data sets (indomain data) used for i-vector trainin...
متن کاملUTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation
This study describes systems submitted by the Center for Robust Speech Systems (CRSS) from the University of Texas at Dallas (UTD) to the 2016 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE). We developed 4 UBM and DNN i-vector based speaker recognition systems with alternate data sets and feature representations. Given that the emphasis of the NIST SR...
متن کاملSpeaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic Features
Speaker verification in real-world applications sometimes deals with limited duration of enrollment and/or test data. MFCC-based i-vector systems have defined the state-of-the-art for speaker verification, but it is well known that they are less effective with short utterances. To address this issue, we propose a method to leverage the speaker specificity and stationarity of subglottal acoustic...
متن کامل