Exploring robustness of DNN/RNN for extracting speaker baum-welch statistics in mismatched conditions
نویسندگان
چکیده
This work explores the use of DNN/RNN for extracting Baum-Welch sufficient statistics in place of the conventional GMM-UBM in speaker recognition. In this framework, the DNN/RNN is trained for automatic speech recognition (ASR) and each of the output unit corresponds to a component of GMM-UBM. Then the outputs of network are combined with acoustic features to calculate sufficient statistics for speaker recognition. We evaluate and analyze the performance of networks with different configurations and training corpuses in this paper. Experimental results on text-independent SRE NIST 2008 and text-dependent RSR2015 speaker verification tasks show the robustness of DNN/RNN for extracting statistics in mismatched evaluation conditions compared with GMM-UBM system. Particularly, Long Short-Term Memory (LSTM) RNN realized in this work outperforms traditional DNN and GMM-UBM in most mismatched conditions.
منابع مشابه
Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition
We examine the use of Deep Neural Networks (DNN) in extracting Baum-Welch statistics for i-vector-based textindependent speaker recognition. Instead of training the universal background model using the standard EM algorithm, the components are predefined and correspond to the set of triphone states, the posterior occupancy probabilities of which are modeled by a DNN. Those assignments are then ...
متن کاملOn the Issue of Calibration in DNN-Based Speaker Recognition Systems
This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provi...
متن کاملImproved i-Vector Representation for Speaker Diarization
This paper proposes using a previously well-trained deep neural network (DNN) to enhance the i-vector representation used for speaker diarization. In effect, we replace the Gaussian Mixture Model (GMM) typically used to train a Universal Background Model (UBM), with a DNN that has been trained using a different large scale dataset. To train the T-matrix we use a supervised UBM obtained from the...
متن کاملFactor analysis of acoustic features using a mixture of probabilistic principal component analyzers for robust speaker verification
Robustness due to mismatched train/test conditions is one of the biggest challenges facing speaker recognition today, with transmission channel/handset and additive noise distortion being the most prominent factors. One limitation of the recent speaker recognition systems is that they are based on a latent factor analysis modeling of the GMM mean super-vectors alone. Motivated by the covariance...
متن کاملComparing the Bidirectional Baum-Welch Algorithm and the Baum-Welch Algorithm on Regular Lattice
A profile hidden Markov model (PHMM) is widely used in assigning protein sequences to protein families. In this model, the hidden states only depend on the previous hidden state and observations are independent given hidden states. In other words, in the PHMM, only the information of the left side of a hidden state is considered. However, it makes sense that considering the information of the b...
متن کامل