A Stack Decoder for Continous Speech Recognition
نویسنده
چکیده
We describe the structure, preliminary implementation and performance of an algorithm for doing continuous speech recognition. The algorithm, known as a stack decoder, proceeds by continually evaluating one-word extensions of the most promising partial transcriptions of an input utterance. The output is a list of candidate complete transcriptions, ordered by likelihood under a stochastic model. The stochastic model in the current implementation is composed solely of an acoustic component a linguistic component will soon be added. The acoustic models make use of dictionary phonetic spellings together with models for phonemes in context. The linguistic models will be based on digram statistics.
منابع مشابه
W - a Fast , Memory Efficient One - Pass Stack Decoder
This paper describes features and implementation details of the $N$>$_ decoder, a fast, memory ecient one-pass stack decoder designed for large vocabulary speech recognition with dictionaries 65536 words. The stack decoder design made it possible to use arbitrary backo N-gram language models in the rst pass. A new on-demand N-gram LM-lookahead for the tree lexicon is introduced. Decoding time w...
متن کاملEvaluation of a Stack Decoder on a Japanese Newspaper Dictation Task
This paper describes the evaluation of the !V$N$>$_!W stack decoder for LVCSR on a 5000 word Japanese newspaper dictation task [3]. Using continuous density acoustic models with 2000 and 3000 states trained on the JNAS/ASJ corpora and a 3-gram LM trained on the RWC text corpus, both models provided by the IPA group, it was possible to reach more than 95% word accuracy on the standard test set. ...
متن کاملAn Efficient A* Stack Decoder Algorithm for Continuous Speech Recognition with a Stochastic Language Model
The stack decoder is an attractive algorithm for controlling the acoustic and language model matching in a continuous speech recognizer. A previous paper described a near-optimal admissible Viterbi A* search algorithm for use with noncross-word acoustic models and no-grammar language models [16]. This paper extends this algorithm to include unigram language models and describes a modified versi...
متن کاملContext-dependent word duration modelling for robust speech recognition
Conventional hidden Markov models (HMMs) have weak duration constraints. This may cause the decoder to produce word matches with unrealistic durations in noisy situations. This paper describes techniques for modelling context-dependent word duration cues and incorporating them directly in a multi-stack decoding algorithm. The proposed model is capable of penalising duration constraints of a wor...
متن کاملAlgorithms for an Optimal A* Search and Linearizing the Search in the Stack Decoder
The stack decoder is an attractive algorithm for controlling the acoustic and language model matching in a continuous speech recognizer. It implements a best-first tree search of the language to find the best match to both the language model and the observed speech. This paper describes a method for performing the optimal A* search which guarantees to find the most likely path (recognized sente...
متن کامل