Joint Uncertainty Decoding for Robust Large Vocabulary Speech Recognition

نویسنده

  • H. Liao
چکیده

Standard techniques to increase automatic speech recognition noise robustness typically assume recognition models are clean trained. This “clean” training data may in fact not be clean at all, but may contain channel variations, varying noise conditions, as well as different speakers. Hence rather than considering noise robustness techniques as compensating clean acoustic models for environmental noise, they may be thought of as reducing the acoustic mismatch between training and test conditions. This report examines the application of VTS model compensation or model-based Joint uncertainty decoding to clean and multistyle trained systems. An EM-based noise estimation procedure is also presented to produce ML VTS or Joint noise models depending on the form of compensation used. Alternatively, compared to multistyle training, adaptive training with Joint uncertainty transforms, also referred to as JAT in this work, provides a better method for handling heterogeneous data. With JAT, the uncertainty bias added to the model variances de-weights observations proportional to the noise level. In this way, Joint transforms normalise the noise from the data allowing the canonical model to solely represent the underlying “clean” acoustic signal. This report presents a novel Joint adaptive training framework including formula for estimating the transforms and canonical model parameters. Lastly, large vocabulary systems are often trained on multistyle data sets such as broadcast news or conversational telephone speech that have a variety of noise conditions. However, to date not much research has been done on compensating such systems built with non-artificially corrupted data. In this report, experiments are conducted on an artificially corrupted Resource Management database and the large vocabulary Broadcast News corpus of collected broadcast recordings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Joint uncertainty decoding for noise robust speech recognition

Background noise can have a significant impact on the performance of speech recognition systems. A range of fast featurespace and model-based schemes have been investigated to increase robustness. Model-based approaches typically achieve lower error rates, but at an increased computational load compared to feature-based approaches. Thismakes their use inmany situations impractical. The uncertai...

متن کامل

Uncertainty Decoding for Noise Robust Automatic Speech Recognition

This report presents uncertainty decoding as a method for robust automatic speech recognition for the Noise Robust Automatic Speech Recognition project funded by Toshiba Research Europe Limited. The effects of noise on speech recognition are reviewed and a general framework for noise robust speech recognition introduced. Common and related noise robustness techniques are described in the contex...

متن کامل

Joint Uncertainty Decoding for Noise R

Background noise can have a significant impact on the performance of speech recognition systems. A range of fast featurespace and model-based schemes have been investigated to increase robustness. Model-based approaches typically achieve lower error rates, but at an increased computational load compared to feature-based approaches. Thismakes their use inmany situations impractical. The uncertai...

متن کامل

A MMSE estimator in mel-cepstral domain for robust large vocabulary automatic speech recognition using uncertainty propagation

Uncertainty propagation techniques achieve a more robust automatic speech recognition by modeling the information missing after speech enhancement in the short-time Fourier transform (STFT) domain in probabilistic form. This information is then propagated into the feature domain where recognition takes place and combined with observation uncertainty techniques like uncertainty decoding. In this...

متن کامل

Feature versus model based noise robustness

Over the years, the focus in noise robust speech recognition has shifted from noise robust features to model based techniques such as parallel model combination and uncertainty decoding. In this paper, we contrast prime examples of both approaches in the context of large vocabulary recognition systems such as used for automatic audio indexing and transcription. We look at the approximations the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006