Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask

نویسندگان

چکیده

Many studies on deep learning-based speech enhancement (SE) utilizing the computational auditory scene analysis method typically employs ideal binary mask or ratio to reconstruct enhanced signal. However, many SE applications in real scenarios demand a desirable balance between denoising capability and cost. In this study, first, an improvement over attain more superior performance is proposed through introducing efficient adaptive correlation-based factor for adjusting mask. The exploits correlation coefficients among noisy speech, noise clean effectively re-distribute power of during construction phase. Second, make supervised system computationally-efficient, quantization techniques are considered reduce number bits needed represent floating numbers, leading compact model. quantized utilized conjunction with 4-layer neural network (DNN-QCM) comprising dropout regulation, pre-training noise-aware training derive robust high-order mapping enhancement, improve generalization unseen conditions. Results show that outperforms conventional representation other algorithms used comparison. When compared DNN as its learning targets, DNN-QCM provided approximately 6.5% short-time objective intelligibility score 11.0% perceptual evaluation quality score. introduction can weights 5-bit from 32-bit, while suppressing stationary non-stationary noise. Timing analyses also incorporated increase compactness, inference time be reduced by 15.7% 10.5%, respectively.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integration of DNN based speech enhancement and ASR

Speech enhancement employing Deep Neural Networks (DNNs) is gaining strength as a data-driven alternative to classical Minimum Mean Square Error (MMSE) enhancement approaches. In the past, Observation Uncertainty approaches to integrate MMSE speech enhancement with Automatic Speech Recognition (ASR) have yielded good results as a lightweight alternative for robust ASR. In this paper we thus exp...

متن کامل

Towards minimum perceptual error training for DNN-based speech synthesis

We propose to use a perceptually-oriented domain to improve the quality of text-to-speech generated by deep neural networks (DNNs). We train a DNN that predicts the parameters required for speech reconstruction but whose cost function is calculated in another domain. In this paper, to represent this perceptual domain we extract an approximated version of the SpectroTemporal Excitation Pattern t...

متن کامل

Speech Enhancement using Adaptive Data-Based Dictionary Learning

In this paper, a speech enhancement method based on sparse representation of data frames has been presented. Speech enhancement is one of the most applicable areas in different signal processing fields. The objective of a speech enhancement system is improvement of either intelligibility or quality of the speech signals. This process is carried out using the speech signal processing techniques ...

متن کامل

DNN-Based Feature Enhancement Using Joint Training Framework for Robust Multichannel Speech Recognition

Ever since the deep neural network (DNN) appeared in the speech signal processing society, the recognition performance of automatic speech recognition (ASR) has been greatly improved. Due to this achievement, the demands on various applications in distant-talking environment also have been increased. However, ASR performance in such environments is still far from that in close-talking environme...

متن کامل

Student-Teacher Learning for BLSTM Mask-based Speech Enhancement

Spectral mask estimation using bidirectional long short-term memory (BLSTM) neural networks has been widely used in various speech enhancement applications, and it has achieved great success when it is applied to multichannel enhancement techniques with a mask-based beamformer. However, when these masks are used for single channel speech enhancement they severely distort the speech signal and m...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2021

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2021.3056711