On training targets for deep learning approaches to clean speech magnitude spectrum estimation

نویسندگان

چکیده

Estimation of the clean speech short-time magnitude spectrum (MS) is key for enhancement and separation. Moreover, an automatic recognition (ASR) system that employs a front-end relies on MS estimation to remain robust. Training targets deep learning approaches fall into three categories: computational auditory scene analysis (CASA), MS, minimum mean square error (MMSE) estimator training targets. The choice target can have significant impact enhancement/separation robust ASR performance. Motivated by this, produces enhanced/separated at highest quality intelligibility which best found. Three different neural network (DNN) types two datasets, include real-world nonstationary coloured noise sources multiple signal-to-noise ratio (SNR) levels, were used evaluation. Ten objective measures employed, including word rate Deep Speech system. It found estimate priori SNR MMSE estimators produce scores. it established gain ideal amplitude mask scores are most suitable front-end.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep learning approaches to problems in speech recognition ,

Deep learning approaches to problems in speech recognition, computational chemistry, and natural language text processing George Edward Dahl Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2015 The deep learning approach to machine learning emphasizes high-capacity, scalable models that learn distributed representations of their input. This dissertation demons...

متن کامل

End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum

In this paper, we propose an end-to-end deep learning framework to detect speech paralinguistics using perception aware spectrum as input. Existing studies show that speech under cold has distinct variations of energy distribution on low frequency components compared with the speech under ‘healthy’ condition. This motivates us to use perception aware spectrum as the input to an end-to-end learn...

متن کامل

A framework for estimation of clean speech b speech enhancemen

A novel multiple-input Kalman filtering (MIKF) framework is presented that estimates the clean speech signal by fusion of outputs from multiple speech enhancement systems. The MIKF framework generates a sample-by-sample minimum mean-square error estimate of the clean speech signal from these outputs. The residual noise in each input to the MIKF is modeled as an autoregressive (AR) process so th...

متن کامل

Experiments on deep learning for speech denoising

In this paper we present some experiments using a deep learning model for speech denoising. We propose a very lightweight procedure that can predict clean speech spectra when presented with noisy speech inputs, and we show how various parameter choices impact the quality of the denoised signal. Through our experiments we conclude that such a structure can perform better than some comparable sin...

متن کامل

Clean speech feature estimation based on soft spectral masking

In this paper, we first analyze the problems of speech and noise contamination process in noise-masking point of view, and propose a new approach to estimate degree of noise masking effect on clean speech distribution model based on sequential noise estimation. Sequential noise estimation is performed frame-by-frame using interacting multiple model (IMM) algorithm, so that realtime implementati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of the Acoustical Society of America

سال: 2021

ISSN: ['0001-4966', '1520-9024', '1520-8524']

DOI: https://doi.org/10.1121/10.0004823