The DBOX Corpus Collection of Spoken Human-Human and Human-Machine Dialogues
نویسندگان
چکیده
The paper describes a project for continuous data collection for a spoken dialogue system engaged in Question-Answering interactions in English. The Wizard-of-Oz method used in the bootstrap phase is presented, and several types of resulting dialogue annotations are described. The resulting corpus will be publicly released.
منابع مشابه
Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors
While human tutors respond to both what a student says and to how the student says it, most tutorial dialogue systems cannot detect the student emotions and attitudes underlying an utterance. We present an empirical study investigating the feasibility of recognizing student state in two corpora of spoken tutoring dialogues, one with a human tutor, and one with a computer tutor. We first annotat...
متن کاملTowards a large corpus of spoken dialogue in French that will be freely available: the "Parole Publique" project and its first realisations
This paper presents two corpora (OTG et ECOLE_MASSY) which are the first delivery of the Parole_Publique (in English : Public Speech) project held by the VALORIA laboratory. This project aims at the achievement of a large corpus (orthographic transcription and morpho-syntactic annotation) of spoken French dialogues. It is primarily intended for researches on man-machine communication and will g...
متن کاملTowards Emotion Prediction in Spoken Tutoring Dialogues
Human tutors detect and respond to student emotional states, but current machine tutors do not. Our preliminary machine learning experiments involving transcription, emotion annotation and automatic feature extraction from our human-human spoken tutoring corpus indicate that the spoken tutoring system we are developing can be enhanced to automatically predict and adapt to student emotional states.
متن کاملEvaluating spoken dialogue models under the interactive pattern recognition framework
The new Interactive Pattern Recognition (IPR) framework has been proposed to deal with human-machine interaction. In this context a new formulation has been recently defined to represent a Spoken Dialogue System as an IPR problem. In this work this formulation is applied to define graphical models that deal with Spoken Dialogue Systems. The definition of both a Dialogue Manager and a User Model...
متن کاملThe Swedish NICE Corpus – Spoken and embodied characters in a c
This article describes the collection and analysis of a Swedish database of spontaneous and unconstrained children–machine dialogues. The Swedish NICE corpus consists of spoken dialogues between children aged 8 to 15 and embodied fairytale characters in a computer game scenario. Compared to previously collected corpora of children’s computer-directed speech, the Swedish NICE corpus contains ext...
متن کامل