Real User Evaluation of Spoken Dialogue Systems Using Amazon Mechanical Turk

نویسندگان

Filip Jurcícek

Simon Keizer

Milica Gasic

François Mairesse

Blaise Thomson

Kai Yu

Steve J. Young

چکیده

This paper describes a framework for evaluation of spoken dialogue systems. Typically, evaluation of dialogue systems is performed in a controlled test environment with carefully selected and instructed users. However, this approach is very demanding. An alternative is to recruit a large group of users who evaluate the dialogue systems in a remote setting under virtually no supervision. Crowdsourcing technology, for example Amazon Mechanical Turk (AMT), provides an efficient way of recruiting subjects. This paper describes an evaluation framework for spoken dialogue systems using AMT users and compares the obtained results with a recent trial in which the systems were tested by locally recruited users. The results suggest that the use of crowdsourcing technology is feasible and it can provide reliable results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collecting Voices from the Cloud

The collection and transcription of speech data is typically an expensive and time-consuming task. Voice over IP and cloud computing are poised to greatly reduce this impediment to research on spoken language interfaces in many domains. This paper documents our efforts to deploy speech-enabled web interfaces to large audiences over the Internet via Amazon Mechanical Turk, an online marketplace ...

متن کامل

For a fistful of dollars: using crowd-sourcing to evaluate a spoken language CALL application

We present an evaluation of a Web-deployed spoken language CALL system, carried out using crowd-sourcing methods. The system, “Survival Japanese”, is a crash course in tourist Japanese implemented within the platform CALL-SLT. The evaluation was carried out over one week using the Amazon Mechanical Turk. Although we found a high proportion of attempted scammers, there was a core of 23 subjects ...

متن کامل

The Negochat Corpus of Human-agent Negotiation Dialogues

Annotated in-domain corpora are crucial to the successful development of dialogue systems of automated agents, and in particular for developing natural language understanding (NLU) components of such systems. Unfortunately, such important resources are scarce. In this work, we introduce an annotated natural language human-agent dialogue corpus in the negotiation domain. The corpus was collected...

متن کامل

TURKOISE: a Mechanical Turk-based Tailor-made Metric for Spoken Language Translation Systems in the Medical Domain

In this paper, we will focus on the evaluation of MedSLT, a medium-vocabulary hybrid speech translation system intended to support medical diagnosis dialogues between a physician and a patient who do not share a common language (Bouillon et al, 2005). How can the developers be sure of delivering good translation quality to their users, in a domain where reliability is of the highest importance?...

متن کامل

The user model-based summarize and refine approach improves information presentation in spoken dialog systems

A common task for spoken dialog systems (SDS) is to help users select a suitable option (e.g., flight, hotel, and restaurant) from the set of options available. As the number of options increases, the system must have strategies for generating summaries that enable the user to browse the option space efficiently and successfully. In the user-model based summarize and refine approach (UMSR, Demb...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Real User Evaluation of Spoken Dialogue Systems Using Amazon Mechanical Turk

نویسندگان

چکیده

منابع مشابه

Collecting Voices from the Cloud

For a fistful of dollars: using crowd-sourcing to evaluate a spoken language CALL application

The Negochat Corpus of Human-agent Negotiation Dialogues

TURKOISE: a Mechanical Turk-based Tailor-made Metric for Spoken Language Translation Systems in the Medical Domain

The user model-based summarize and refine approach improves information presentation in spoken dialog systems

عنوان ژورنال:

اشتراک گذاری