A Speech Recognizer Based on Locally Recurrent Neural Networks

نویسندگان

  • Klaus Kasper
  • Herbert Reininger
  • Dietrich Wolf
چکیده

Speech recognition systems (SRS) designed for applications in low cost products like telephones or in systems with energetic constraints like autonomous vehicles are faced with the demand for solutions with low complexity. A small vocabulary consisting of a few command words as well as the digits is suucient for most of the applications but has to be recognized robustly. Here we report about investigations concerning the application of Recurrent Neural Networks for speaker independent speech recognition. Fully Recurrent Neural Networks (FRNN) are used for feature scoring as well as for compensating variations in time durations of speech segments. Two SRS based on FRNN are discussed. Firstly, a phoneme based recognizer is investigated in which the feature scoring as well as the time alignment is performed by FRNN. The performance of the FRNN used for feature scoring is compared to that of a TDNN with optimized delay structure in order to evaluate the capability of FRNN to extract contextual information. The performance of this time alignment FRNN is compared to that of viterbi alignment procedures including diierent types of phoneme duration modeling. Secondly, a SRS consisting of a single FRNN is presented which directly classiies feature vector sequences and thus combines feature scoring and time alignment. To enable an eecient hardware implementation of the SRS we introduce Locally Recurrent Neural Networks (LRNN). LRNN are layered networks which have recurrent connections only between a neuron and its n-nearest neighbours. The neurons of the input and the output layer have unidirectional and sparse connections to the hidden layer. Thus, in comparison to FRNN the density of the connections is drastically reduced. Particularly, long distance wiring could be avoided in a hardware realization. Our experiments have shown that LRNN with recurrent connections to the 5-nearest neighbours of a neuron in the hidden layer achieve the same recognition performance as FRNN.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Speech Recognizer with Low Complexity Based on Rnn

Speech recognition systems (SRS) designed for applications in low cost products like telephones or in systems with energetic constraints like autonomous vehicles are faced with the demand for solutions with low complexity. A small vocabulary consisting of a few command words and the digits is suucient for most of the applications but has to be recognized robustly. Here we report about investiga...

متن کامل

Strategies for reducing the complexity of a RNN based speech recognizer

Recurrent Neural Networks (RNN) provide a solution for low cost Speech Recognition Systems (SRS) in mass products or in products with energetic constraints if their inherent parallelism could be exploited in a hardware realization. Actually, the computational complexity of SRS based on Fully Recurrent Neural Networks (FRNN), e.g. the large number of connections, prevents a hardware realization....

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Speech Recognition Using Neural Networks

Although speech recognition products are already available in the market at present, their development is mainly based on statistical techniques which work under very specific assumptions. The work presented in this thesis investigates the feasibility of alternative approaches for solving the problem more efficiently. A speech recognizer system comprised of two distinct blocks, a Feature Extrac...

متن کامل

Continuous Speech Phoneme Recognition Using Dynamic Artificial Neural Networks

Phoneme classification and recognition is the first step to large vocabulary continuous speech recognition. This step represents the acoustic modeling part of such a system. In hybrid speech recognition systems phoneme recognition is made by artificial neural networks (ANN’s). The main objective of this paper is the investigation of dynamic ANN’s, namely the Time-Delay Neural Networks (TDNN) an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995