Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition
نویسندگان
چکیده
In this paper, we first present a new variant of Gaussian restricted Boltzmann machine (GRBM) called multivariate Gaussian restricted Boltzmann machine (MGRBM), with its definition and learning algorithm. Then we propose using a learned GRBM or MGRBM to extract better features for robust speech recognition. Our experiments on Aurora2 show that both GRBM-extracted and MGRBM-extracted feature performs much better than Mel-frequency cepstral coefficient (MFCC) with either HMM-GMM or hybrid HMM-deep neural network (DNN) acoustic model, and MGRBM-extracted feature is slightly better.
منابع مشابه
Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR
The performance of an automatic speech recognition (ASR) system degrades severely in noisy and reverberant environments in part due to the lack of robustness in the underlying representations used in the ASR system. On the other hand, the auditory processing studies have shown the importance of modulation filtered spectrogram representations in robust human speech recognition. Inspired by these...
متن کاملMulti-task learning deep neural networks for speech feature denoising
Traditional automatic speech recognition (ASR) systems usually get a sharp performance drop when noise presents in speech. To make a robust ASR, we introduce a new model using the multi-task learning deep neural networks (MTL-DNN) to solve the speech denoising task in feature level. In this model, the networks are initialized by pre-training restricted Boltzmann machines (RBM) and fine-tuned by...
متن کاملA Hybrid Algorithm based on Deep Learning and Restricted Boltzmann Machine for Car Semantic Segmentation from Unmanned Aerial Vehicles (UAVs)-based Thermal Infrared Images
Nowadays, ground vehicle monitoring (GVM) is one of the areas of application in the intelligent traffic control system using image processing methods. In this context, the use of unmanned aerial vehicles based on thermal infrared (UAV-TIR) images is one of the optimal options for GVM due to the suitable spatial resolution, cost-effective and low volume of images. The methods that have been prop...
متن کاملUnsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection
Speech Synthesis (SS) and Voice Conversion (VC) presents a genuine risk of attacks for Automatic Speaker Verification (ASV) technology. In this paper, we use our recently proposed unsupervised filterbank learning technique using Convolutional Restricted Boltzmann Machine (ConvRBM) as a frontend feature representation. ConvRBM is trained on training subset of ASV spoof 2015 challenge database. A...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1309.6176 شماره
صفحات -
تاریخ انتشار 2013