Enhancement and Recognition of Reverberant and Noisy Speech by Extending Its Coherence

نویسندگان

Scott Wisdom

Thomas Powers

Les E. Atlas

James W. Pitton

چکیده

Most speech enhancement algorithms make use of the short-time Fourier transform (STFT), which is a simple and flexible time-frequency decomposition that estimates the short-time spectrum of a signal. However, the duration of short STFT frames are inherently limited by the nonstationarity of speech signals. The main contribution of this paper is a demonstration of speech enhancement and automatic speech recognition in the presence of reverberation and noise by extending the length of analysis windows. We accomplish this extension by performing enhancement in the short-time fan-chirp transform (STFChT) domain, an overcomplete time-frequency representation that is coherent with speech signals over longer analysis window durations than the STFT. This extended coherence is gained by using a linear model of fundamental frequency variation of voiced speech signals. Our approach centers around using a single-channel minimum mean-square error log-spectral amplitude (MMSE-LSA) estimator proposed by Habets, which scales coefficients in a timefrequency domain to suppress noise and reverberation. In the case of multiple microphones, we preprocess the data with either a minimum variance distortionless response (MVDR) beamformer, or a delay-and-sum beamformer (DSB). We evaluate our algorithm on both speech enhancement and recognition tasks for the REVERB challenge dataset. Compared to the same processing done in the STFT domain, our approach achieves significant improvement in terms of objective enhancement metrics (including PESQ—the ITU-T standard measurement for speech quality). In terms of automatic speech recognition (ASR) performance as measured by word error rate (WER), our experiments indicate that the STFT with a long window is more effective for ASR.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain

Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...

متن کامل

Enhancement of Reverberant and Noisy Speech by Extending Its Coherence

We introduce a novel speech enhancement algorithm for removing reverberation and noise from recorded speech data. Our approach centers around using a single-channel minimum mean-square error log-spectral amplitude (MMSELSA) estimator, which applies gain coefficients in a timefrequency domain to suppress noise and reverberation. The main contribution of this paper is that the enhancement is done...

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

Optimized Wavelet-based Speech Enhancement for Speech Recognition in Noisy and Reverberant Conditions

We present an improved speech enhancement method based on Wiener filtering in the wavelet domain for automatic speech recognition (ASR). The wavelet coefficients that are contaminated by the effects of late reflection and background noise are filtered using a Wiener gain. We optimize the wavelet parameters for speech, background noise and late reflection to achieve a better estimate of the Wien...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1509.00533 شماره

صفحات -

تاریخ انتشار 2015

Enhancement and Recognition of Reverberant and Noisy Speech by Extending Its Coherence

نویسندگان

چکیده

منابع مشابه

A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain

Enhancement of Reverberant and Noisy Speech by Extending Its Coherence

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Optimized Wavelet-based Speech Enhancement for Speech Recognition in Noisy and Reverberant Conditions

عنوان ژورنال:

اشتراک گذاری