Neural "spike rate spectrum" as a noise robust, speaker invariant feature for automatic speech recognition
نویسندگان
چکیده
A new feature set for ASR called Rate-Spectrum(RS) is proposed. RS is a spectral representation obtained using a computational auditory model. The feature is noise-robust and considerably speaker invariant. RS matches the smoothed log spectrum both in shape and dynamic range variation. DCT is used to reduce dimensionality. Comparison of the proposed features with MFCC is done using an Isolated word recognition experiment on the TI Digits database, for clean and noisy speech cases. For speakers seen during training, RS and RS-DCT outperform MFCC in noisy case while matching that of MFCC in the clean case. For unseen speakers, RS does better than MFCC in the clean case, RS-DCT outperforms MFCC in the noisy case. We have thus shown that the proposed feature for ASR is noise robust and speaker invariant.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملNoise-Robust Speech Recognition Through Auditory Feature Detection and Spike Sequence Decoding
Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes fr...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملInvariant Representations for Noisy Speech Recognition
Modern automatic speech recognition (ASR) systems need to be robust under acoustic variability arising from environmental, speaker, channel, and recording conditions. Ensuring such robustness to variability is a challenge in modern day neural network-based ASR systems, especially when all types of variability are not seen during training. We attempt to address this problem by encouraging the ne...
متن کامل