An Improved Model for Recognizing Disfluencies in Conversational Speech

نویسندگان

  • Mark Johnson
  • Eugene Charniak
  • Matthew Lease
چکیده

This paper presents a novel metadata extraction (MDE) system for automatically detecting edited words, fillers, and self-interruption points in conversational speech. Our edit word detection sub-system combines a Tree Adjoining Grammar (TAG) noisy channel model, a statistical syntactic language model, and a MaxEnt reranker. Hand-built, deterministic rules are used to detect fillers. Self-interruption points are explicitly determined by detected fillers and edited words. We have evaluated our system for these three tasks on two types of input: manually annotated words and automatically recognized speech-to-text tokens. In all six cases, our system has improved the state-of-the-art, as measured in a recent blind evaluation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Detection of Sentence Boundaries, Disfluencies, and Conversational Fillers in Spontaneous Speech

Automatic Detection of Sentence Boundaries, Disfluencies, and Conversational Fillers in Spontaneous Speech

متن کامل

Intercoder reliability in annotating complex disfluencies

In previous work, we presented an annotation scheme that can describe complex disfluencies. In this paper, we first show the prevalence of complex disfluencies and illustrate the types of distinctions that our scheme allows. Second, we present an annotation tool that allows the scheme to be easily applied. Third, we present the results of a reliability study in annotating complex disfluencies w...

متن کامل

Automatic Long Audio Alignment and Confidence Scoring for Conversational Arabic Speech

In this paper, a framework for long audio alignment for conversational Arabic speech is proposed. Accurate alignments help in many speech processing tasks such as audio indexing, speech recognizer acoustic model (AM) training, audio summarizing and retrieving, etc. We have collected more than 1,400 hours of conversational Arabic besides the corresponding human generated non-aligned transcriptio...

متن کامل

Pseudo-Syntactic Language Modeling for Disfluent Speech Recognition

Abstract Language models for speech recognition are generally trained on text corpora. Since these corpora do not contain the disfluencies found in natural speech, there is a train/test mismatch when these models are applied to conversational speech. In this work we investigate a language model (LM) designed to model these disfluencies as a syntactic process. By modeling selfcorrections we obta...

متن کامل

Synthesising Uncertainty: The Interplay of Vocal Effort and Hesitation Disfluencies

As synthetic voices become more flexible, and conversational systems gain more potential to adapt to the environmental and social situation, the question needs to be examined, how different modifications to the synthetic speech interact with each other and how their specific combinations influence perception. This work investigates how the vocal effort of the synthetic speech together with adde...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004