Improvements to bucket box intersection algorithm for fast GMM computation in embedded speech recognition systems
نویسندگان
چکیده
Real-time performance is a very important goal for embedded speech recognition systems, where the evaluation of likelihoods for Gaussian mixture models (GMM) usually dominates the computation of a continuous density hidden Markov model (CDHMM) based system. The Bucket Box Intersection (BBI) algorithm is an optimization technique that uses a K-Dimensional binary tree to speed up the score computation of GMM without significantly hurting the recognition accuracy. In this paper, we propose three improvements to the traditional BBI algorithm. First, we define the optimal dividing hyper-plane as the plane that generates minimum expected number of mixture evaluations instead of the median hyper-plane. The size of BBI tree is reduced largely because of that. Second, we refine the location of dividing plane as the one that has the sameMahalanobis distance to the closest dividing mixture pair, instead of the boundary of Gaussian box. By doing this, we are able to improve recognition accuracy. Finally, we introduce the dividing planes which run across two dimensions to boost the range of dividing plane candidates and thus bring more speed-ups. We evaluated these techniques using Conversay’s speech engine CASSI in 2 different domains. The experimental results of new BBI algorithm show significant performance improvement over traditional BBI algorithm. Compared to the baseline system with no BBI algorithm implementation, we were able to speed up Gaussian computations by 50% with a less than 5% relative increase in word error rate.
منابع مشابه
Fast speaker independent large vocabulary continuous speech recognition
To build useful applications based on large vocabulary continuous speech recognition systems, such systems have to run in real time on common platforms. However, with most research focused on further reducing the recognition error rates, the topic of speed has been neglected in the development of speech recognition algorithms. I will present a speaker independent system that has been designed f...
متن کاملThe bucket box intersection (BBI) algorithm for fast approximative evaluation of diagonal mixture Gaussians
Today, most of the state-of-the-art speech recognizers are based on Hidden Markov modeling. Using semi-continuous or continuous density Hidden Markov Models, the computation of emission probabilities requires the evaluation of mixture Gaussian probability density functions. Since it is very expensive to evaluate all the Gaussians of the mixture density codebook, many recognizers only compute th...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملSpeeding up the score computation of HMM speech regognizers with the bucket voronoi intersection algorithm
With increasing sizes of speech databases, speech recognizers with huge parameter spaces have become trainable. However, the time and memory requirements for high accuracy re-altime speaker-independent continuous speech recognition will probably not be met by the available hardware for a reasonable price for the next few years. This paper describes the application of the Bucket Voronoi Intersec...
متن کاملFour-layer categorization scheme of fast GMM computation techniques in large vocabulary continuous speech recognition systems
Large vocabulary continuous speech recognition systems are known to be computationally intensive. A major bottleneck is the Gaussian mixture model (GMM) computation and various techniques have been proposed to address this problem. We present a systematic study of fast GMM computation techniques. As there are a large number of these and it is impractical to exhaustively evaluate all of them, we...
متن کامل