Phone Duration Modeling for LVCSR Using Neural Networks

نویسندگان

  • Hossein Hadian
  • Daniel Povey
  • Hossein Sameti
  • Sanjeev Khudanpur
چکیده

We describe our work on incorporating probabilities of phone durations, learned by a neural net, into an ASR system. Phone durations are incorporated via lattice rescoring. The input features are derived from the phone identities of a context window of phones, plus the durations of preceding phones within that window. Unlike some previous work, our network outputs the probability of different durations (in frames) directly, up to a fixed limit. We evaluate this method on several large vocabulary tasks, and while we consistently see improvements in Word Error Rates, the improvements are smaller when the lattices are generated with neural net based acoustic models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Gyroscope Random Drift Modeling, using Neural Networks, Fuzzy Neural and Traditional Time- series Methods

In this paper statistical and time series models are used for determining the random drift of a dynamically Tuned Gyroscope (DTG). This drift is compensated with optimal predictive transfer function. Also nonlinear neural-network and fuzzy-neural models are investigated for prediction and compensation of the random drift. Finally the different models are compared together and their advantages a...

متن کامل

Context-dependent phone mapping for LVCSR of under-resourced languages

This paper presents a context-dependent phone mapping approach for acoustic modeling of large vocabulary speech recognition for under-resourced languages by leveraging on well trained models of other languages. Generally speaking, phone mapping can be considered as a hybrid HMM/MLP (Hidden Markov Model / Multilayer Perceptron) model where the input of the MLP is phone acoustic scores, e.g. like...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Modeling of Texture and Color Froth Characteristics for Evaluation of Flotation Performance in Sarcheshmeh Copper Pilot Plant, Using Image Analysis and Neural Networks

Texture and color appearance of froth is a discreet qualitative tool for evaluating the performance of flotation process. The structure of a froth developed on the flotation cell has a significant effect on the grade and recovery of copper concentrate. In this work, image analysis and neural networks have been implemented to model and control the performance of such a system. The result reveals...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017