Environmental Noise Embeddings for Robust Speech Recognition
نویسندگان
چکیده
We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model. A deep neural network is used to predict the acoustic environment in which the system in being used. The discriminative embedding generated at the bottleneck layer of this network is then concatenated with traditional acoustic features as input to a deep neural network acoustic model. Using simulated acoustic environments we show that the proposed approach significantly improves speech recognition accuracy in noisy and highly reverberant environments, outperforming multi-condition training and multi-task learning for this task.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملUsing adaptive signal limiter together with noise-robust techniques for noisy speech recognition
In a speech recognition system, environmental mismatch between speech models and test speech causes serious performance degradation. To solve this environmental mismatch problem, smoothing process is one of the most widely used techniques. In this paper, an adaptive signal limiter (ASL) is developed to smooth speech features so that the undesired spectral variations could be effectively reduced...
متن کاملSpeech Enhancement Employing Variational Noise Model Composition for Robust Speech Recognition in Time-Varying Noisy Environments
This study proposes an effective noise estimation method for robust speech recognition in time-varying noise conditions. The proposed noise estimation scheme employs the Variation Model Composition (VMC) method, where multiple noise models are generated by selectively applying perturbation factors to the mean parameters of a basis noise model. The noise estimate is obtained by using the posteri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1601.02553 شماره
صفحات -
تاریخ انتشار 2016