2016 BUT Babel System: Multilingual BLSTM Acoustic Model with i-Vector Based Adaptation
نویسندگان
چکیده
The paper provides an analysis of BUT automatic speech recognition systems (ASR) built for the 2016 IARPA Babel evaluation. The IARPA Babel program concentrates on building ASR system for many low resource languages, where only a limited amount of transcribed speech is available for each language. In such scenario, we found essential to train the ASR systems in a multilingual fashion. In this work, we report superior results obtained with pre-trained multilingual BLSTM acoustic models, where we used multi-task training with separate classification layer for each language. The results reported on three Babel Year 4 languages show over 3% absolute WER reductions obtained from such multilingual pre-training. Experiments with different input features show that the multilingual BLSTM performs the best with simple log-Mel-filter-bank outputs, which makes our previously successful multilingual stack bottleneck features with CMLLR adaptation obsolete. Finally, we experiment with different configurations of i-vector based speaker adaptation in the monoand multi-lingual BLSTM architectures. This results in additional WER reductions over 1% absolute.
منابع مشابه
Speaker Representations for Speaker Adaptation in Multiple Speakers' BLSTM-RNN-Based Speech Synthesis
Training a high quality acoustic model with a limited database and synthesizing a new speaker’s voice with a few utterances have been hot topics in deep neural network (DNN) based statistical parametric speech synthesis (SPSS). To solve these problems, we built a unified framework for speaker adaptive training as well as speaker adaptation on Bidirectional Long ShortTerm Memory with Recurrent N...
متن کاملRapid Update of Multilingual Deep Neural Network for Low-Resource Keyword Search
This paper proposes an approach to rapidly update a multilingual deep neural network (DNN) acoustic model for low-resource keyword search (KWS). We use submodular data selection to select a small amount of multilingual data which covers diverse acoustic conditions and is acoustically close to a low-resource target language. The selected multilingual data together with a small amount of the targ...
متن کاملA Divide-and-Conquer Approach for Language Identification Based on Recurrent Neural Networks
This paper describes the design of an acoustic language recognition system based on BLSTM that can discriminate closely related languages and dialects of the same language. We introduce a Divide-and-Conquer (D&C) method to quickly and successfully train an RNN-based multi-language classifier. Experiments compare this approach to the straightforward training of the same RNN, as well as to two wi...
متن کاملImproving Under-Resourced Language ASR Through Latent Subword Unit Space Discovery
Development of state-of-the-art automatic speech recognition (ASR) systems requires acoustic resources (i.e., transcribed speech) as well as lexical resources (i.e., phonetic lexicons). It has been shown that acoustic and lexical resource constraints can be overcome by first training an acoustic model that captures acoustic-to-multilingual phone relationships on languageindependent data; and th...
متن کاملMulti-Channel Speech Recognition: LSTMs All the Way Through
Long Short-Term Memory recurrent neural networks (LSTMs) have demonstrable advantages on a variety of sequential learning tasks. In this paper we demonstrate an LSTM “triple threat” system for speech recognition, where LSTMs drive the three main subsystems: microphone array processing, acoustic modeling, and language modeling. This LSTM trifecta is applied to the CHiME-4 distant recognition cha...
متن کامل