Improving wideband acoustic models using mixed-bandwidth training data via DNN adaptation
نویسندگان
چکیده
In the past few years, deep neural networks (DNNs) have achieved great successes in speech recognition. The deep network model can be viewed as a series of feature transforms followed by a log-linear classifier. For input of speeches from different bandwidths, although the hidden layer transform and log-linear classification can be shared, the input layer transforms should be specially designed respectively. So, training DNNs directly on different bandwidth speeches is intractable. In this paper, we treat the problem of training DNNs on mixed bandwidth data as an domain-adaptation problem. Upon our adaptation approach, DNNs trainied on the rich narrowband speech can be adapted effectively to the target wideband domain, and meanwhile shows good performance on the wideband speech. We evaluate this approach on the wideband clean7k and noise360 speech. Experimental results show that the DNNs adaptation approach can reduce character error rate (CER) range from 5% to 15%, relatively, over the baseline DNNs trained only on the limited wideband data.
منابع مشابه
Improving DNN Bluetooth Narrowband Acoustic Models by Cross-Bandwidth and Cross-Lingual Initialization
The success of deep neural network (DNN) acoustic models is partly owed to large amounts of training data available for different applications. This work investigates ways to improve DNN acoustic models for Bluetooth narrowband mobile applications when relatively small amounts of in-domain training data are available. To address the challenge of limited indomain data, we use cross-bandwidth and...
متن کاملOptimizing DNN Adaptation for Recognition of Enhanced Speech
Speech enhancement directly using deep neural network (DNN) is of major interest due to the capability of DNN to tangibly reduce the impact of noisy conditions in speech recognition tasks. Similarly, DNN based acoustic model adaptation to new environmental conditions is another challenging topic. In this paper we present an analysis of acoustic model adaptation in presence of a disjoint speech ...
متن کاملImproving DNN-Based Automatic Recognition of Non-native Children's Speech with Adult Speech
Acoustic models for state-of-the-art DNN-based speech recognition systems are typically trained using at least several hundred hours of task-specific training data. However, this amount of training data is not always available for some applications. In this paper, we investigate how to use an adult speech corpus to improve DNN-based automatic speech recognition for non-native children's speech....
متن کاملGMM-derived features for effective unsupervised adaptation of deep neural network acoustic models
In this paper we investigate GMM-derived features recently introduced for adaptation of context-dependent deep neural network HMM (CD-DNN-HMM) acoustic models. We improve the previously proposed adaptation algorithm by applying the concept of speaker adaptive training (SAT) to DNNs built on GMM-derived features and by using fMLLR-adapted features for training an auxiliary GMM model. Traditional...
متن کاملWTIMIT: The TIMIT Speech Corpus Transmitted Over The 3G AMR Wideband Mobile Network
In anticipation of upcoming mobile telephony services with higher speech quality, a wideband (50 Hz to 7 kHz) mobile telephony derivative of TIMIT has been recorded called WTIMIT. It opens up various scientific investigations; e.g., on speech quality and intelligibility, as well as on wideband upgrades of network-side interactive voice response (IVR) systems with retrained or bandwidth-extended...
متن کامل