Improving Speech Recognizer Performance
نویسندگان
چکیده
This thesis investigates N-best hypotheses reranking techniques for improving speech recognition accuracy. We have focused on improving the accuracy of a speech recognizer used in a dialog system. Our post-processing approach uses a linear regression model to predict the error rate of each hypothesis from hypothesis features, and then outputs the one that has the lowest (recomputed) error rate. We investigated 15 different features sampled from 3 components of a dialog system: a decoder, a parser and a dialog manager. These features are speech recognizer score, acoustic model score, language model score, N-best word rate, N-best homogeneity with speech recognizer score, N-best homogeneity with language model score, N-best homogeneity with acoustic model score, unparsed words, gap number, fragmentation transitions, highest-in-coverage, slot bigram, conditional slot, expected slots and conditional slot bigram. We also used a linear rescaling with clipping technique to normalize feature values to deal with differences in order of magnitude. A searching strategy was used to discover the optimal feature set for reordering; three search algorithms were examined: stepwise regression, greedy search and brute force search. To improve reranking accuracy and reduce computation we examined techniques for selecting utterances likely to benefit from reranking then applying reranking only to utterances so identified. Besides the conventional performance metric, word error rate, we also proposed concept error rate as an alternative metric. An experiment with human subjects revealed that concept error rate is the metric that better conforms to the criteria used by humans when they evaluated hypotheses quality. The reranking model, that performed the best, combined 6 features together to predict error rate. These 6 features are speech recognizer score, language model score, acoustic model score, slot bigram, N-best homogeneity with speech recognizer score and N-best word rate. This optimal set of features was obtained using greedy search. This model can improve the word error rate significantly beyond the speech recognizer baseline. The reranked word error rate is 11.14%, which is a 2.71% relative improvement from the baseline. The reranked concept error rate is 9.68%, which is a 1.22% relative improvement from the baseline. Adding an utterance selection module into a reranking process did not improve the reranking performance beyond the number achieved by reranking every utterance. However, some selection criteria achieved the same overall error rate by reranking just a small number (8.37%) of the utterances. When comparing the performance of the proposed reranking technique to the performance of a human on the same reranking task, the proposed method did as well as a native speaker, suggesting that an automatic reordering process is quite competitive.
منابع مشابه
Gender-dependent emotion recognition based on HMMs and SPHMMs
It is well known that emotion recognition performance is not ideal. The work of this research is devoted to improving emotion recognition performance by employing a two-stage recognizer that combines and integrates gender recognizer and emotion recognizer into one system. Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) have been used as classifiers in the two-stage ...
متن کاملWithin-Word vs. Across-Word Decoding for Online Speech Recognition
In this paper we describe methods for improving the RWTH German speech recognizer used within the VERBMOBIL project. In particular, we present acceleration methods for the search based on both within-word and across-word phoneme models. The recognizer in the VERBMOBIL project is used in an online environment. We will discuss some incremental methods to reduce the response time of an on-line spe...
متن کاملPerformance analysis of various single channel speech enhancement algorithms for automatic speech recognition
This paper analyzes the performance of various single channel speech enhancement systems when they are applied to automatic speech recognition (ASR) systems as a preprocessor. Until now the researches on speech enhancement algorithms have focused on improving the perceptual quality of speech signal. However, it has not been verified yet whether the improvements of the perceptual quality also in...
متن کاملImproving Information Extraction by Modeling Errors in Speech Recognizer Output
In this paper we describe a technique for improving the performance of an information extraction system for speech data by explicitly modeling the errors in the recognizer output. The approach combines a statistical model of named entity states with a lattice representation of hypothesized words and errors annotated with recognition confidence scores. Additional refinements include the use of m...
متن کاملAn HMM-based phoneme recognizer applied to assessment of dysarthric speech
This paper describes work on the development of an HMM-based system for automatic speech assessment, particularly of dysarthric speech. As a first step, we compare recognizer performance on a closed-set, forced choice identification test of dysarthric speech with performance on the same test by untrained listeners. Results indicate that HMM recognition accuracy averaged over all utterances of a...
متن کامل