Asynchronous, online, GMM-free training of a context dependent acoustic model for speech recognition
نویسندگان
چکیده
We propose an algorithm that allows online training of a context dependent DNN model. It designs a state inventory based on DNN features and jointly optimizes the DNN parameters and alignment of the training data. The process allows flat starting a model from scratch and avoids any dependency on a GMM acoustic model to bootstrap the training process. A 15k state model trained with the proposed algorithm reduced the error rate on a mobile speech task by 24% compared to a system bootstrapped from a CI GMM and by 16% compared to a system bootstrapped from a CD GMM system.
منابع مشابه
Gmm-free Dnn Training
While deep neural networks (DNNs) have become the dominant acoustic model (AM) for speech recognition systems, they are still dependent on Gaussian mixture models (GMMs) for alignments both for supervised training and for context dependent (CD) tree building. Here we explore bootstrapping DNN AM training without GMM AMs and show that CD trees can be built with DNN alignments which are better ma...
متن کاملAllophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملImproved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملContext Dependency in Neural Network Based Acoustic Models
Our recent experiments with Gaussian mixture (GMM) based acoustic models have shown that employing context dependent acoustic models, namely triphones, can greatly improve recognition accuracy [1] in comparison to systems based on context independent units. Significant portion of our research has been aimed at exploring the possibilities of neural networks as acoustic models for speech recognit...
متن کاملTper Hcaeser Pidi Application of Subspace Gaussian Mixture Models in Contrastive Acoustic Scenarios
This paper describes experimental results of applying Subspace Gaussian Mixture Models (SGMMs) in two completely diverse acoustic scenarios: (a) for Large Vocabulary Continuous Speech Recognition (LVCSR) task over (well-resourced) English meeting data and, (b) for acoustic modeling of underresourced Afrikaans telephone data. In both cases, the performance of SGMM models is compared with a conve...
متن کامل