HMM topology selection for accurate acoustic and duration modeling
نویسندگان
چکیده
In this paper we show that accurate HMMs for connected word recognition can be obtained without context dependent modeling and discriminative training. To account for di erent speaking rates, we de ne two HMMs for each word that must be trained. The two models have the same, standard, left to right topology with the possibility of skipping one state, but each model has a di erent number of states, automatically selected. Our simple modeling and training technique has been applied to connected digit recognition using the adult speaker portion of the TI/NIST corpus. The obtained results are comparable with the best ones reported in the literature for models with a larger number of densities.
منابع مشابه
Hidden Markov models (HMMs) isolated word recognizer with the optimization of acoustical analysis and modeling techniques
Most state of the art automatic speech recognition (ASR) systems are typically based on continuous Hidden Markov Models (HMMs) as acoustic modeling technique. It has been shown that the performance of HMM speech recognizers may be affected by a bad choice of the type of acoustic feature parameters in the acoustic front end module. For these reasons, we propose in this paper a dedicated isolated...
متن کاملEvaluating and correcting phoneme segmentation for unit selection synthesis
As part of improved support for building unit selection voices, the Festival speech synthesis system now includes two algorithms for automatic labeling of wavefile data. The two methods are based on dynamic time warping and HMM-based acoustic modeling. Our experiments show that DTW is more accurate 70% of the time, but is also more prone to gross labeling errors. HMM modeling exhibits a systema...
متن کاملProgress in automatic meeting transcription
In this paper we report recent developments on the meeting transcription task, a large vocabulary conversational speech recognition task. Previous experiments showed this is a very challenging task, with about 50% word error rate (WER) using existing recognizers. The difficulty mostly comes from highly disfluent/conversational nature of meetings, and lack of domain specific training data. For t...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملInformation Theoretic Analysis of DNN-HMM Acoustic Modeling
We propose an information theoretic framework for quantitative assessment of acoustic modeling for hidden Markov model (HMM) based automatic speech recognition (ASR). Acoustic modeling yields the probabilities of HMM sub-word states for a short temporal window of speech acoustic features. We cast ASR as a communication channel where the input sub-word probabilities convey the information about ...
متن کامل