Frame-by-frame language identification in short utterances using deep neural networks

نویسندگان

  • Javier Gonzalez-Dominguez
  • Ignacio Lopez-Moreno
  • Pedro J. Moreno
  • Joaquín González-Rodríguez
چکیده

This work addresses the use of deep neural networks (DNNs) in automatic language identification (LID) focused on short test utterances. Motivated by their recent success in acoustic modelling for speech recognition, we adapt DNNs to the problem of identifying the language in a given utterance from the short-term acoustic features. We show how DNNs are particularly suitable to perform LID in real-time applications, due to their capacity to emit a language identification posterior at each new frame of the test utterance. We then analyse different aspects of the system, such as the amount of required training data, the number of hidden layers, the relevance of contextual information and the effect of the test utterance duration. Finally, we propose several methods to combine frame-by-frame posteriors. Experiments are conducted on two different datasets: the public NIST Language Recognition Evaluation 2009 (3 s task) and a much larger corpus (of 5 million utterances) known as Google 5M LID, obtained from different Google Services. Reported results show relative improvements of DNNs versus the i-vector system of 40% in LRE09 3 second task and 76% in Google 5M LID.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Identification Based on Generative Modeling of Posteriorgram Sequences Extracted from Frame-by-Frame DNNs and LSTM-RNNs

This paper aims to enhance spoken language identification methods based on direct discriminative modeling of language labels using deep neural networks (DNNs) and long shortterm memory recurrent neural networks (LSTM-RNNs). In conventional methods, frame-by-frame DNNs or LSTM-RNNs are used for utterance-level classification. Although they have strong frame-level classification performance and r...

متن کامل

Context Memory Networks for Multi-objective Semantic Parsing in Conversational Understanding

The end-to-end multi-domain and multi-task learning of the full semantic frame of user utterances (i.e., domain and intent classes and slots in utterances) have recently emerged as a new paradigm in spoken language understanding. An advantage of the joint optimization of these semantic frames is that the data and feature representations learnt by the model are shared across different tasks (e.g...

متن کامل

Bidirectional Modelling for Short Duration Language Identification

Language identification (LID) systems typically employ ivectors as fixed length representations of utterances. However, it may not be possible to reliably estimate i-vectors from short utterances, which in turn could lead to reduced language identification accuracy. Recently, Long Short Term Memory networks (LSTMs) have been shown to better model short utterances in the context of language iden...

متن کامل

ESTIMATING THE VULNERABILITY OF THE CONCRETE MOMENT RESISTING FRAME STRUCTURES USING ARTIFICIAL NEURAL NETWORKS

Heavy economic losses and human casualties caused by destructive earthquakes around the world clearly show the need for a systematic approach for large scale damage detection of various types of existing structures. That could provide the proper means for the decision makers for any rehabilitation plans. The aim of this study is to present an innovative method for investigating the seismic vuln...

متن کامل

Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks

Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (∼3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neural networks : the official journal of the International Neural Network Society

دوره 64  شماره 

صفحات  -

تاریخ انتشار 2015