Reducing Computational Complexities of Exemplar-Based Sparse Representations with Applications to Large Vocabulary Speech Recognition
نویسندگان
چکیده
Recently, exemplar-based sparse representation phone identification features (Spif ) have shown promising results on large vocabulary speech recognition tasks. However, one problem with exemplar-based techniques is that they are computationally expensive. In this paper, we present two methods to speed up the creation of Spif features. First, we explore a technique to quickly select a subset of informative exemplars among millions of training examples. Secondly, we make approximations to the sparse representation computation such that a matrix-matrix multiplication is reduced to a matrix-vector product. We present results on four large vocabulary tasks, including Broadcast News where acoustic models are trained with 50 and 400 hours, and a Voice Search task, where models are trained with 160 and 1000 hours. Results on all tasks indicate improvements in speedup by a factor of four relative to the original Spif features, as well as improvements in word error rate (WER) in combination with a baseline HMM system.
منابع مشابه
Uncertainty Measures for Improving Exemplar-Based Source Separation
This work studies the use of observation uncertainty measures for improving the speech recognition performance of an exemplar-based source separation based front end. To generate the observation uncertainty estimates for the enhanced features, we propose the use of heuristic methods based on the sparse representation of the noisy signal in the exemplar-based source separation algorithm. The eff...
متن کاملSparse representation features for speech recognition
In this paper, we explore the use of exemplar-based sparse representations (SRs) to map test features into the linear span of training examples. We show that the frame classification accuracy with these new features is 1.3% higher than a Gaussian Mixture Model (GMM), showing that not only do SRs move test features closer to training, but also move the features closer to the correct class. Given...
متن کاملNoise-robust Automatic Speech Recognition with Exemplar-based Sparse Representations Using Multiple Length Adaptive Dictionaries
In this work, we apply our recently proposed sparse representations based speech recognition system on the small vocabulary track of the 2 ‘CHiME’ Speech Separation and Recognition Challenge. This system uses exemplars of different length to approximate noisy speech segments as a linear combination of the speech and noise exemplars with sparse weights. The exemplars are labeled speech segments ...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کامل