Method for Asr Performance Prediction Based on Temporal Properties of Speech Signal
نویسندگان
چکیده
Extending previous work on prediction of phoneme recognition error from unlabelled data, corrupted by unpredictable factors, the current work investigates a simple but effective method of estimating ASR performance by computing Mean Temporal Distance (MTD), which is the mean distance between speech feature vectors, determined as a function of temporal distance between the vectors. It is shown that MTD is a function of the signal-to-noise ratio of the speech signal. Comparing MTD curves, derived on data used for training of the classifier, and on test utterances, allows for predicting error on the test data. Another interesting observation from the proposed technique is that the Mean Temporal Distance remains approximately constant, as temporal separation exceeds certain critical interval (about 200 ms), corresponding to the extent of coarticulation in speech sounds. This lends further support to the notion that speech message is coded in overlapping speech sound units, lasting approximately 200 ms.
منابع مشابه
بهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگیهای استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز
The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...
متن کاملAutomatic speech recognition with primarily temporal envelope information
The aim of this study is to devise a computational method to predict cochlear implant (CI) speech recognition. Here, we describe a high-throughput screening system for optimizing CI speech processing strategies using hidden Markov model (HMM)-based automatic speech recognition (ASR). Word accuracy was computed on vocoded CI speech synthesized from primarily multi-channel temporal envelope infor...
متن کاملHilbert Envelope Based Features for Far-Field Speech Recognition
Automatic speech recognition (ASR) systems, trained on speech signals from close-talking microphones, generally fail in recognizing far-field speech. In this paper, we present a Hilbert Envelope based feature extraction technique to alleviate the artifacts introduced by room reverberations. The proposed technique is based on modeling temporal envelopes of the speech signal in narrow sub-bands u...
متن کاملMulti-step linear prediction based speech dereverberation in noisy reverberant environment
A speech signal captured by a distant microphone is generally contaminated by reverberation and background noise, which severely degrade the automatic speech recognition (ASR) performance. In this paper, we first extend a previously proposed single channel dereverberation algorithm to a multi-channel scenario. The method estimates late reflections using multichannel multi-step linear prediction...
متن کاملPrediction of Epileptic Seizures in Patients with Temporal Lobe Epilepsy (TLE) based on Cepstrum analysis and AR model of EEG signal
Epilepsy is a chronic disorder of brain function caused by abnormal and excessive electrical neurons discharge in the brain. Seizures cause disturbances in consciousness that occur without prior notice, so their prediction ability, based on EEG data, can reduce stress and improve quality of life. An epileptic patient EEG data consists of five parts: Ictal, Inter-Ictal, pre-Ictal, Post-Ictal, an...
متن کامل