Accessible technology for interactive systems: a new approach to spoken language research

نویسندگان

  • Ronald A. Cole
  • Stephen Sutton
  • Yonghong Yan
  • Pieter J. E. Vermeulen
  • Mark A. Fanty
چکیده

In this paper. we argue for a paradigm shift in spoken language technology, from transcription tasks to interactive systems. The current paradigm evaluates speech recognition technology in terms of word recognition accuracy on large vocabulary transcription tasks? such as telephone conversations or media broadcasts. Systems are evaluated in international competitions, with strict rules for participation and well-defined evaluation metrics. Participation in these competitions is limited to a few elite laboratories that have the resources to develop and field systems. We propose a new, more productive and more accessible paradigm for spoken language research, in which research advances are evaluated in the context of interactive systems that allow people to perform useful tasks, such as accessing information from the World Wide Web, while driving a car. These systems are made available for daily use by ordinary citizens through telephone networks or placement in easily accessible kiosks in public institutions. It is argued [1,2,3] that this new paradigm, which focuses on the goal of universal access to information for all people, better serves the needs of the research community, as well as the welfare of our citizens. We discuss the challenges and rewards of an interactive system approach to spoken language research, and discuss our initial attempts to stimulate a paradigm shift and engage a large community of researchers through free distribution of the CSLU Toolkit. 1. SPOKEN LANGUAGE SYSTEMS Spoken language systems allow people to interact with machines using speech to accomplish useful tasks. The essence of a spoken language system is interaction-the dynamic interaction between a person and a machine using speech, and the interaction of the different language technologies within the system. At a minimum, a spoken language system integrates dialogue modeling, speech recognition and speech generation. It can also include natural language understanding, language identification, machine translation, speaker recognition, as well as other multimodal (e.g., handwriting, gesture recognition, speech reading) and multimedia (e.g., facial animation, video) capabilities. The success of a spoken language system depends upon the manner in which the component technologies interact to produce an effective dialogue that accomplishes the task at hand. An effective system produces prompts that elicit the set of desired responses from the user (and minimizes undesired responses), detects recognition errors and out of vocabulary utterances, engages in conversational repair when such errors occur, and responds in an appropriate wa,v when the dialogue breaks down. While performance of each component technology is important. the manner in which they interact is even more so. Speech recognition is but one essential component of an integrated system. To use an analogy, it is well understood that there is little gain in increasing the processor speed in a computer, when the processor is starved of data. In that case one should speed up the data access before increases in processor speed will be of benefit. Similarly, in spoken language systems, components other than recognition will at some point mask any improvements in recognition. The interactions among the modules of spoken language systems are usually highly complex and interdependent and can be studied and understood only by developing and evaluating working systems. Based on their experiences in developing a spoken language system for taking the U.S. census, Cole et al. [6] conclude: “Taken together. the results of this project showed that the most important component of a spoken dialogue system is the dialogue. A successful system gives instructions efficiently. establishes expectations for the user, asks questions that constrain the possible responses. and proceeds in a straightforward manner to complete the interview.” 2. THE NEED FOR SPOKEN LANGUAGE

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مقایسه روش های طیفی برای شناسایی زبان گفتاری

Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...

متن کامل

A New Statistical Model for Evaluation Interactive Question Answering Systems Using Regression

The development of computer systems and extensive use of information technology in the everyday life of people have just made it more and more important for them to make quick access to information that has received great importance. Increasing the volume of information makes it difficult to manage or control. Thus, some instruments need to be provided to use this information. The QA system is ...

متن کامل

A Novel Interactive Possibilistic Mixed Integer Nonlinear Model for Cellular Manufacturing Problem under Uncertainty

Elaborating an appropriate cellular manufacturing system (CMS) could solve many structural and operational issues. Thereby, considering some significant factors as worker skill, machine hardness, and product quality levels could assist the companies in current competitive environment. This paper proposes a novel interactive possibilistic mixed integer nonlinear approach to minimize the total co...

متن کامل

Development and Evaluation of the Spoken Dialogue System Based on the W3C Recommendations

Due to progress in technology of speech recognition and understanding, the Spoken dialogue systems (SDS) have started to emerge as a practical alternative for a conversational computer interface. They are more effective than Interactive Voice Response (IVR) systems since they allow a more free and natural interaction. The Spoken dialogue systems are designed for providing automatic dialogue-bas...

متن کامل

Zanzibar OpenIVR: An Open-Source Framework for Development of Spoken Dialog Systems

The maturity of standards and the availability of open source components for all levels of the MRCP stack provide us with new opportunities for the development of spoken dialog technology. In this paper a standard-based and modular architecture for interactive voice response (IVR) systems is presented together with its implementation – Zanzibar OpenIVR. The architecture, described in terms of c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998