The Hearsay Speech Understanding System: An Example of the Recognition Process
نویسندگان
چکیده
This paper describes the structure and operation of the Hearsay speech understanding system by the use of a specific example illustrating the various stages of recognition. The system consists of a set of cooperating independent processes, each representing a source of Knowledge. The knowledge is used either to predict what may appear in a given context or to verify hypotheses resulting from a prediction. The structure of the system is illustrated by considering its Operation in a particular task situation: Voice-Chess. The representation and use of various sources of knowledge are outlined. Preliminary results of the reduction in search resulting from the use of various sources of knowledge are given. The factors influencing the structure and operation of a speech understanding system are many and complex. The report of Newell et al. (1971) discusses these issues in detail. Our own goals and efforts in this area have been described in several earlier papers (Reddy et al., 1972). The goals for our present effort were outlined in Reddy, Erman, and Neely (1970). The initial structural description of the Hearsay system was given in Reddy (1971). The model and the system that evolved after several design iterations were described in Reddy, Erman, and Neely (1972a).* The main additions to the initial proposed system were in the specification of the interactions among various sources of knowledge. In this paper, we describe the structure and operation of the Hearsay system from a different point of view, i.e., by considering a specific example to illustrate the various stages of the recognition process. Machine perception of speech differs from many other problems in artificial intelligence in that it is characterized by high data rates, large amounts of data, and the availability of many sources of knowledge. Thus, the techniques that must be * The general framework that evolved for the model is different from some previously proposed models by Liba man et al. (1962) and Halle and Stevens (1962) which imply that perception takes place through the active mediation of motor centers. Our efforts tend to support "sensory" theories advanced by Fant (1964) and others. If one modifies the "synthesis" part of analysis-by-synthesis, then our model is most similar to that of Halle and Stevens. employed differ from other problem-solving systems in which weaker and weaker methods are used to solve a problem using less and less Information about the actual task. In addition, there is …
منابع مشابه
The head system and its approach to rule based acoustic-phonetic recognition of speech
One approach to ASR is based on the Hearsay II (ref 1) architecture which utilises a nurober of phonetic and linguistic knowledge sources in its recognition process. The fundamental ideas behind the Hearsay II architecture has been used with a Danish speech project, which is aiming for industrial applications implemented into a Standard microprocessor system to demonstrate a continuous speech, ...
متن کاملSystem Organizations for Speech Understanding: Implications of Network and Multiprocessor Computer Architecture for AI
This paper considers various factors affecting system organization for speech understanding research. The structure of the Hearsay system based on a set of cooperating, independent processes using the hypothesize-and-test paradigm is presented. Design considerations for the effective use of multiprocessor and network architectures in speech understanding systems are presented: control of proces...
متن کاملOn the Need for a Theory of Integration of Knowledge Sources for Spoken Language Understanding
In the Spoken Language Understanding (SLU) community we are seeing a renewed interest both in the theoretical issues and practical problems of melding our understanding of human language use with the traditional signal processing emphasis of Speech Recognition (SR). Although the Hearsay II Speech Understanding System demonstrated an early awareness of the potential — and pitfalls — of integrati...
متن کاملImproved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کامل