Average instantaneous frequency (AIF) and average log-envelopes (ALE) for ASR with the Aurora 2 database

نویسندگان

Yadong Wang

Jesse Hansen

Gopi Krishna Allu

Ramdas Kumaresan

چکیده

We have developed a novel approach to speech feature extraction based on a modulation model of a band-pass signal. Speech is processed by a bank of band-pass filters. At the output of the band-pass filters the signal is subjected to a log-derivative operation which naturally decomposes the band-pass signal into analytic (called ) and anti-analytic (called ) components. The average instantaneous frequency (AIF) and average log-envelope (ALE) are then extracted as coarse features at the output of each filter. Further, refined features may also be extracted from the analytic and anti-analytic components (but not done in this paper). We then evaluated the Aurora 2 task where noise corruption is synthetic. For clean training, (compared to the mel-cepstrum front end, with 3 mixture HMM back-end,) our AIF/ALE front end achieves an average improvement of with set A and improvement with set B and (negative) ‘improvement’ with set C. The overall improvement in accuracy rates for clean training is . Although the improvements are modest, the novelty of the front-end and its potential for future enhancements are our strengths.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A generalized framework for compensation of mel-filterbank outputs in feature extraction for robust ASR

This paper describes a novel and efficient noise-robust frontend that utilizes a set of Mel-filterbank output compensation methods, together with cumulative distribution mapping of cepstral coefficients, for noisy speech recognition. The proposed compensation framework includes the use of noise spectral subtraction, spectral flooring and log Mel-filterbank output weighting. Recognition experime...

متن کامل

Joint Bayesian predictive classification and parallel model combination with prior scaling for robust ASR

This paper presents a model compensation approach based on Bayesian predictive classification (BPC). In order to obtain effective prior distributions for BPC, our approach uses parallel model combination (PMC) to set the prior mean, and a likelihood ratio to set a scaled frame-specific prior variance. Experiments on the Aurora 2 database show that the proposed approach results in improved avera...

متن کامل

A Log-energy Scaling Normalization Scheme for Robust Speech Recognition

The log-energy parameter, as an auxiliary but influential feature, has been commonly used to augment Mel-frequency cepstral coefficients (MFCCs) to improve the recognition accuracy in automatic speech recognition (ASR). In this paper, a new and effective scaling approach named log-energy scaling normalization (LESN), which utilizes special nonlinear scaling functions on noisy speech data for lo...

متن کامل

On compensating the Mel-frequency cepstral coefficients for noisy speech recognition

This paper describes a novel noise-robust automatic speech recognition (ASR) front-end that employs a combination of Mel-filterbank output compensation and cumulative distribution mapping of cepstral coefficients with truncated Gaussian distribution. Recognition experiments on the Aurora II connected digits database reveal that the proposed front-end achieves an average digit recognition accura...

متن کامل

An MTF-based blind restoration of temporal power envelopes as a front-end processor for automatic speech recognition systems in reverberant environments

To reduce speech degradation in reverberant environments, we previously proposed a modulation transfer function (MTF) based method of speech restoration. The room impulse response (RIR) in this restoration does not need to be measured at any time since we modeled the power envelope of the RIRs as an exponential decay function. Speech is assumed to be temporal modulated with white noise carrier ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Average instantaneous frequency (AIF) and average log-envelopes (ALE) for ASR with the Aurora 2 database

نویسندگان

چکیده

منابع مشابه

A generalized framework for compensation of mel-filterbank outputs in feature extraction for robust ASR

Joint Bayesian predictive classification and parallel model combination with prior scaling for robust ASR

A Log-energy Scaling Normalization Scheme for Robust Speech Recognition

On compensating the Mel-frequency cepstral coefficients for noisy speech recognition

An MTF-based blind restoration of temporal power envelopes as a front-end processor for automatic speech recognition systems in reverberant environments

عنوان ژورنال:

اشتراک گذاری