Integrating Time Alignment and Neural Networks for High Performance Continuous Speech Recognition

نویسندگان

Patrick Haffner

Michael Franzini

Alex Waibel

چکیده

Successful application of existing connectionist methods to continuous speech recognition requires the use or time-alignment procedures. These procedures. usually based on dynamic programming, provide means for supervising the training of neural networks. This paper describes two systems in which neural network classifiers are merged with dynamic programming (DP) time alignment methods to produce high performance continuous speech recognizers. One system uses the Connectionist Viterbi Training (CVT) procedure, in which a neural network with frame-level outputs is trained using guidance from a time alignment procedure. The other system uses Multi-State Time Delay Neural Networks (MS-TDNNs). in which embedded DP time alignment allows network training with only word-level external supervision. CVT has been described previously [l] ; only changes lo the system and new results on the TI Digits task are reported here. The newest CVT results on the TI Digits are 99.1% word accuracy and 98.0% string accuracy. MSTDNNs, introduced in this paper, are described in more detail here, with attcntion focused on their basic architecture, the training procedure, and results of applying MS-TDNNs to continuous speakerdependent alphabet recognition: on two speakers, word accuracy is respectively 97.5% and 89.7%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten

Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...

متن کامل

معرفی شبکه های عصبی پیمانه ای عمیق با ساختار فضایی-زمانی دوگانه جهت بهبود بازشناسی گفتار پیوسته فارسی

In this article, growable deep modular neural networks for continuous speech recognition are introduced. These networks can be grown to implement the spatio-temporal information of the frame sequences at their input layer as well as their labels at the output layer at the same time. The trained neural network with such double spatio-temporal association structure can learn the phonetic sequence...

متن کامل

Scaly Neural Networks for Speech Recognition Using DTW and Time Alignment Algorithms

Speech recognition has been an active research topic for more than 50 years. Interacting with the computer through speech is one of the active scientific research fields particularly for the disable community who face variety of difficulties to use the computer. Such research in Automatic Speech Recognition (ASR) is investigated for different languages because each language has its specific fea...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Integrating Time Alignment and Neural Networks for High Performance Continuous Speech Recognition

نویسندگان

چکیده

منابع مشابه

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten

معرفی شبکه های عصبی پیمانه ای عمیق با ساختار فضایی-زمانی دوگانه جهت بهبود بازشناسی گفتار پیوسته فارسی

Scaly Neural Networks for Speech Recognition Using DTW and Time Alignment Algorithms

عنوان ژورنال:

اشتراک گذاری