Deep neural network bottleneck features for generalized variable parameter HMMs

نویسندگان

  • Xurong Xie
  • Rongfeng Su
  • Xunying Liu
  • Lan Wang
چکیده

Recently deep neural networks (DNNs) have become increasingly popular for acoustic modelling in automatic speech recognition (ASR) systems. As the bottleneck features they produce are inherently discriminative and contain rich hidden factors that influence the surface acoustic realization, the standard approach is to augment the conventional acoustic features with the bottleneck features in a tandem framework. In this paper, an alternative approach to incorporate bottleneck features is investigated. The complex relationship between acoustic features and DNN bottleneck features is modelled using generalized variable parameter HMMs (GVP-HMMs). The optimal GVP-HMM structural configuration and model parameters are automatically learnt. Significant error rate reductions of 48% and 8% relative were obtained over the baseline multi-style HMM and tandem HMM systems respectively on Aurora 2.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient use of DNN bottleneck features in generalized variable parameter HMMs for noise robust speech recognition

Recently a new approach to incorporate deep neural networks (DNN) bottleneck features into HMM based acoustic models using generalized variable parameter HMMs (GVPHMMs) was proposed. As Gaussian component level polynomial interpolation is performed for each high dimensional DNN bottleneck feature vector at a frame level, conventional GVPHMMs are computationally expensive to use in recognition t...

متن کامل

Generalized variable parameter HMMs based acoustic-to-articulatory inversion

Acoustic-to-articulatory inversion is useful for a range of related research areas including language learning, speech production, speech coding, speech recognition and speech synthesis. HMM-based generative modelling methods and DNNbased approaches have become dominant approaches in recent years. In this paper, a novel acoustic-to-articulatory inversion technique based on generalized variable ...

متن کامل

Deep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence Information

In recent years, neural network based acoustic-to-articulatory inversion approaches have achieved the state-of-the-art performance. One major issue associated with these approaches is the lack of phone sequence information during inversion. In order to address this issue, this paper proposes an improved architecture hierarchically concatenating phone classification and articulatory inversion co...

متن کامل

Integration of deep bottleneck features for audio-visual speech recognition

Recent interest in “deep learning”, which can be defined as the use of algorithms to model high-level abstractions in data, using models composed of multiple non-linear transformations, has resulted in an increase in the number of studies investigating the use of deep learning with automatic speech recognition (ASR) systems. Some of these studies have found that bottleneck features extracted fr...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014