Statistical Models for Noise-Robust Speech Recognition

نویسنده

  • Rogier Christiaan van Dalen
چکیده

A standard way of improving the robustness of speech recognition systems to noise is model compensation. is replaces a speech recogniser’s distributions over clean speech by ones over noise-corrupted speech. For each clean speech component,model compensation techniques usually approximate the corrupted speech distribution with a diagonal-covariance Gaussian distribution. is thesis looks into improving on this approximation in two ways: rstly, by estimating full-covariance Gaussian distributions; secondly, by approximating corrupted-speech likelihoods without any parameterised distribution. e rst part of this work is about compensating for within-component feature correlations under noise. For this, the covariancematrices of the computed Gaussians should be full instead of diagonal. e estimation of oš-diagonal covariance elements turns out to be sensitive to approximations. A popular approximation is the one that state-of-the-art compensation schemes, likevts compensation, use for dynamic coe›cients: the continuous-time approximation. Standard speech recognisers contain both per-time slice, static, coe›cients, and dynamic coe›cients, which represent signal changes over time, and are normally computed from a window of static coe›cients. To remove the need for the continuous-time approximation, this thesis introduces a new technique. It rst compensates a distribution over the window of statics, and then applies the same linear projection that extracts dynamic coe›cients. It introduces a number ofmethods that address the correlation changes that occur in noisewithin this framework. e next problem is decoding speed with full covariances. is thesis reanalyses the previously-introduced predictive linear transformations, and shows how they can model feature correlations at low and tunable computational cost. e second part of this work removes the Gaussian assumption completely. It introduces a sampling method that, given speech and noise distributions and a mismatch function, in the limit calculates the corrupted speech likelihood exactly. For this, it transforms the integral in the likelihood expression, and then applies sequential importance resampling. ough it is too slow to use for recognition, it enables a more ne-grained assessment of compensation techniques, based on the kl divergence to the ideal compensation for one component. e kl divergence proves to predict the word error rate well. is technique also makes it possible to evaluate the impact of approximations that standard compensation schemes make.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...

متن کامل

Statistical Adaptation of Acoustic M for Robust Speech Re

Noise degrades the performance of Automatic Speech Recognition (ASR) systems working in real condition. The mismatch between the training and recognition conditions is considered the main factor involved in this degradation, and most methods for robust ASR are focussed on its minimization. In this work, we compare robust methods for ASR based on (a) the compensation of the noise effects and (b)...

متن کامل

Hierarchical Classification Tree Modeling of Nonstationary Noise for Robust Speech Recognition

Noise robustness is a key issue in successful deployment of automatic speech recognition systems in demanding environments such as hospital operating rooms. Perhaps the most successful way to overcome the additive noise obstacle is to employ a model adaptation scheme built around a set of dedicated clean speech and noise-only statistical models. Existing recognizer designs generally rely on rel...

متن کامل

Missing feature theory and probabilistic estimation of clean speech components for robust speech recognition

In the framework of Hidden Markov Models (HMMs), this paper presents a new approach towards robust speech recognition in adverse conditions. The approach is based on statistical modeling of noise by Gaussian distributions and an estimation of idealized clean speech directly in the probabilistic domain using a statistical spectral subtraction method and missing feature compensation. The missing ...

متن کامل

Robust Speech Recognition using Model

Maintaining a high level of robustness for Automatic Speech Recognition (ASR) systems is especially challenging when the background noise has a time-varying nature. We have implemented a Model-Based Feature Enhancement (MBFE) technique that not only can easily be embedded in the feature extraction module of a recogniser, but also is intrinsically suited for the removal of non-stationary additiv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011