A Multimodal Corpus of Expert Gaze and Behavior during Phonetic Segmentation Tasks

نویسندگان

  • Arif Khan
  • Ingmar Steiner
  • Yusuke Sugano
  • Andreas Bulling
  • Ross MacDonald
چکیده

Phonetic segmentation is the process of splitting speech into distinct phonetic units. Human experts routinely perform this task manually by analyzing auditory and visual cues using analysis software, which is an extremely time-consuming process. Methods exist for automatic segmentation, but these are not always accurate enough. In order to improve automatic segmentation, we need to model it as close to the manual segmentation as possible. This corpus is an effort to capture the human segmentation behavior by recording experts performing a segmentation task. We believe that this data will enable us to highlight the important aspects of manual segmentation, which can be used in automatic segmentation to improve its accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic cues identifying phonetic transitions for speech segmentation

The quality of corpus-based text-to-speech (TTS) systems depends strongly on the consistency of boundary placements during phonetic alignments. Expert human transcribers use visually represented acoustic cues in order to consistently place boundaries at phonetic transitions according to a set of conventions. We present some features commonly (and informally) used as aid when performing manual s...

متن کامل

The OFAI Multi-Modal Task Description Corpus

The OFAI Multimodal Task Description Corpus (OFAI-MMTD Corpus) is a collection of dyadic teacher-learner (human-human and human-robot) interactions. The corpus is multimodal and tracks the communication signals exchanged between interlocutors in task-oriented scenarios including speech, gaze and gestures. The focus of interest lies on the communicative signals conveyed by the teacher and which ...

متن کامل

Achieving Multimodal Cohesion during Intercultural Conversations

How do English as a lingua franca (ELF) speakers achieve multimodal cohesion on the basis of their specific interests and cultural backgrounds? From a dialogic and collaborative view of communication, this study focuses on how verbal and nonverbal modes cohere together during intercultural conversations. The data include approximately 160-minute transcribed video recordings of ELF interactions ...

متن کامل

The Vernissage Corpus: a Multimodal Human-robot-interaction Dataset

We introduce a new multimodal interaction dataset with extensive annotations in a conversational Human-RobotInteraction (HRI) scenario. It has been recorded and annotated to benchmark many relevant perceptual tasks, towards enabling a robot to converse with multiple humans, such as speaker localization, key word spotting, speech recognition in audio domain; tracking, pose estimation, nodding, v...

متن کامل

Atypical Gaze Behavior in Children with High Functioning Autism During an Active Balance Task

Background. Unusual gaze behavior in children with autism spectrum disorder (ASD) was reported very early in the literature. Objectives. The current study examined gaze behavior in children with ASD and typically developing (TD) children while performing an active balance task on the Wii balance board. Methods: 8 children (male) diagnosed with high-functioning ASD and 9 TD children (3 female, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1712.04798  شماره 

صفحات  -

تاریخ انتشار 2017