Estimating speech recognition error rate without acoustic test data
نویسندگان
چکیده
We address the problem of estimating the word error rate (WER) of an automatic speech recognition (ASR) system without using acoustic test data. This is an important problem which is faced by the designers of new applications which use ASR. Quick estimate of WER early in the design cycle can be used to guide the decisions involving dialog strategy and grammar design. Our approach involves estimating the probability distribution of the word hypotheses produced by the underlying ASR system given the text test corpus. A critical component of this system is a phonemic confusion model which seeks to capture the errors made by ASR on the acoustic data at a phonemic level. We use a confusion model composed of probabilistic phoneme sequence conversion rules which are learned from phonemic transcription pairs obtained by leave-one-out decoding of the training set. We show reasonably close estimation of WER when applying the system to test sets from different domains.
منابع مشابه
Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملFeature space normalization in adverse acoustic conditions
We study the effect of different feature space normalization techniques in adverse acoustic conditions. Recognition tests are reported for cepstral mean and variance normalization, histogram normalization, feature space rotation, and vocal tract length normalization on a German isolated word recognition task with large acoustic mismatch. The training data was recorded in clean office environmen...
متن کاملTowards robustness to fast speech in ASR
Psychoacoustic studies show that human listeners are sensitive to speaking rate variations 10]. Automatic speech recognition (ASR) systems are even more aaected by the changes in rate, as double to quadruple word recognition error rates of average speakers have been observed for fast speakers on many ASR systems 6]. In our earlier work 5], we studied the causes of higher error and concluded tha...
متن کاملRobust i-vector based adaptation of DNN acoustic model for speech recognition
In the past, conventional i-vectors based on a Universal Background Model (UBM) have been successfully used as input features to adapt a Deep Neural Network (DNN) Acoustic Model (AM) for Automatic Speech Recognition (ASR). In contrast, this paper introduces Hidden Markov Model (HMM) based ivectors that use HMM state alignment information from an ASR system for estimating i-vectors. Further, we ...
متن کاملCan continuous speech recognizers handle isolated speech?
Continuous speech is far more natural and ecient than isolated speech for communication. However, for current state-of-the-art automatic speech recognition systems, isolated speech recognition (ISR) is far more accurate than continuous speech recognition (CSR). It is common practice in the speech research community to build CSR systems using only CS data. However, slowing of the speaking rate ...
متن کامل