Word-based confidence measures as a guide for stack search in speech recognition
نویسندگان
چکیده
The Maximum a posteriori hypothesis is treated as the decoded truth in speech recognition. However, since the word recognition accuracy is not 100%, it is desirable to have an independent con dence measure on how good the maximum a posteriori hypothesis is relative to the spoken truth for some applications. E orts are in progress [1, 2, 3] to develop such con dence measures with the intent of applying it to assesment of con dence of whole utterances [4], rescoring of N-best lists, etc. In this paper, we explore the use of word-based con dence measures to adaptively modify the hypothesis score during search in continuous speech recognition: speci cally, based on the con dence of the current sequence of hypothesized words during search, the weight of its prediction is changed as a function of the con dence. Experimental results are described for ATIS and SwitchBoard tasks. About 8% relative reduction in word error is obtained for ATIS.
منابع مشابه
Dynamic tuning of language model score in speech recognition using a confidence measure
Speech recognition errors limit the capability of language models to predict subsequent words correctly. An effective way to enhance the functions of the language model is by using confidence measures. Most of current efforts for developing confidence measures for speech recognition focus on applying these measures to the final recognition result. However, using these measures early in the sear...
متن کاملImproved speech recognition using iterative decoding based on confidence measures
In this paper, a decoding method incorporating word-level conndence measures for improved speech recognition is presented. At rst, we focus on the estimation of conndence measures from the word graph and evaluate them in word graph rescoring (2nd-pass in 2-pass search system). Next, we propose the lexical tree search (1st-pass in 2-pass search system) incorporating the word-level conndence meas...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملتخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت
The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...
متن کاملTime-first search for large vocabulary speech recognition
This paper describes a new search technique for large vocabulary speech recognition based on a stack decoder. Considerable memory savings are achieved with the combination of a tree based lexicon and a new search technique. The search proceeds time-first, that is partial path hypotheses are extended into the future in the inner loop and a tree walk over the lexicon is performed as an outer loop...
متن کامل