Optimal Tailoring of Trajectories, Growing Training Sets and Recurrent Networks for Spoken Word Recognition
نویسندگان
چکیده
A novel system that efficiently integrates two types of neural networks for reliably performing isolated word recognition is described. The recognition system comprises of a feature extractor that includes a Self Organizing Map for an optimal tailoring of trajectory representations of words in reduced dimension feature spaces. Experimental results indicate that such lower dimensional trajectories can provide a reliable representation of spoken words, while reducing the training complexity for the recognition of the trajectory. A recurrent neural network is employed for performing trajectory recognition and a method that allows to progressively grow the training set is utilized for network training. The optimal tailoring of trajectories and growing training sets are two innovations that result in a superior training of the recurrent neural network, which in turn delivers a robust word recognition performance tolerating wide variations in the speech signal.
منابع مشابه
Speech Recognition Using Neural Networks
Although speech recognition products are already available in the market at present, their development is mainly based on statistical techniques which work under very specific assumptions. The work presented in this thesis investigates the feasibility of alternative approaches for solving the problem more efficiently. A speech recognizer system comprised of two distinct blocks, a Feature Extrac...
متن کاملSimple Recurrent Networks and Competition Effects in Spoken Word Recognition
Continuous mapping models of spoken word recognition such as TRACE (McClelland and Elman, 1986) make robust predictions about a wide variety of phenomena. However, most of these models are interactive activation models with preset weights, and do not provide an account of learning. Simple recurrent networks (SRNs, e.g., Elman, 1990) are continuous mapping models that can process sequential patt...
متن کاملUsing word confusion networks for slot filling in spoken language understanding
Semantic slot filling is one of the most challenging problems in spoken language understanding (SLU) because of automatic speech recognition (ASR) errors. To improve the performance of slot filling, a successful approach is to use a statistical model that is trained on ASR one-best hypotheses. The state of the art models for slot filling rely on using discriminative sequence modeling methods, s...
متن کاملEncoding Word Confusion Networks with Recurrent Neural Networks for Dialog State Tracking
This paper presents our novel method to encode word confusion networks, which can represent a rich hypothesis space of automatic speech recognition systems, via recurrent neural networks. We demonstrate the utility of our approach for the task of dialog state tracking in spoken dialog systems that relies on automatic speech recognition output. Encoding confusion networks outperforms encoding th...
متن کاملSimple Recurrent Networks and human spoken word recognition
A crucial problem in cognitive science, especially for speech processing, is sequence encoding. Models of spoken word recognition either ignore the problem (e.g., Norris et al., 2000), posit solutions incapable of representing repeated elements (e.g., Grossberg & Kazerounian, 2011), or ”spatialize” time in possibly unrealistic ways (TRACE; McClelland & Elman, 1986). An alternative that has not ...
متن کامل