Combining Punctuation and Disfluency Prediction: An Empirical Study

نویسندگان

  • Xuancong Wang
  • Khe Chai Sim
  • Hwee Tou Ng
چکیده

Punctuation prediction and disfluency prediction can improve downstream natural language processing tasks such as machine translation and information extraction. Combining the two tasks can potentially improve the efficiency of the overall pipeline system and reduce error propagation. In this work1, we compare various methods for combining punctuation prediction (PU) and disfluency prediction (DF) on the Switchboard corpus. We compare an isolated prediction approach with a cascade approach, a rescoring approach, and three joint model approaches. For the cascade approach, we show that the soft cascade method is better than the hard cascade method. We also use the cascade models to generate an n-best list, use the bi-directional cascade models to perform rescoring, and compare that with the results of the cascade models. For the joint model approach, we compare mixedlabel Linear-chain Conditional Random Field (LCRF), cross-product LCRF and 2layer Factorial Conditional Random Field (FCRF) with soft-cascade LCRF. Our results show that the various methods linking the two tasks are not significantly different from one another, although they perform better than the isolated prediction method by 0.5–1.5% in the F1 score. Moreover, the clique order of features also shows a marked difference.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combination of NN and CRF models for joint detection of punctuation and disfluencies

Inserting proper punctuation marks and deleting speech disfluencies are two of the most essential tasks in spoken language processing. This challenging task has prompted extensive research using various techniques, such as conditional random fields. Neural networks, however, are relatively under-explored for this task. Combining different modeling techniques with different advantages has the po...

متن کامل

Better Punctuation Prediction with Dynamic Conditional Random Fields

This paper focuses on the task of inserting punctuation symbols into transcribed conversational speech texts, without relying on prosodic cues. We investigate limitations associated with previous methods, and propose a novel approach based on dynamic conditional random fields. Different from previous work, our proposed approach is designed to jointly perform both sentence boundary and sentence ...

متن کامل

Machine Translation of Multi-party Meetings: Segmentation and Disfluency Removal Strategies

Translating meetings presents a challenge since multispeaker speech shows a variety of disfluencies. In this paper we investigate the importance of transforming speech into well-written input prior to translating multi-party meetings. We first analyze the characteristics of this data and establish oracle scores. Sentence segmentation and punctuation are performed using a language model, turn in...

متن کامل

توسعه مدل عددی-نیمه تجربی جهت تخمین مشتقات هیدرودینامیکی یک AUV

Increased use of Autonomous Underwater Vehicles (AUVs) caused an increase in their design sensitivity. The subject of most of AUV design studies was on drag reduction, ease of handling and their stability. Adequate design and prediction of the behavior of these vehicles requires an accurate estimation of corresponding hydrodynamic derivative loads. In this study, Hydrodynamic derivatives of the...

متن کامل

Revising the annotation of a Broadcast News corpus: a linguistic approach

This paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014