Backoff Model Training using Partially Observed Data: Application to Dialog Act Tagging

نویسندگان

  • Gang Ji
  • Jeff A. Bilmes
چکیده

Dialog act (DA) tags are useful for many applications in natural language processing and automatic speech recognition. In this work, we introduce hidden backoff models (HBMs) where a large generalized backoff model is trained, using an embedded expectation-maximization (EM) procedure, on data that is partially observed. We use HBMs as word models conditioned on both DAs and (hidden) DAsegments. Experimental results on the ICSI meeting recorder dialog act corpus show that our procedure can strictly increase likelihood on training data and can effectively reduce errors on test data. In the best case, test error can be reduced by 6.1% relative to our baseline, an improvement on previously reported models that also use prosody. We also compare with our own prosody-based model, and show that our HBM is competitive even without the use of prosody. We have not yet succeeded, however, in combining the benefits of both prosody and the HBM.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Training a prosody-based dialog act tagger from unlabeled data

Dialog act tagging is an important step toward speech understanding, yet training such taggers usually requires large amounts of data labeled by linguistic experts. Here we investigate the use of unlabeled data for training HMM-based dialog act taggers. Three techniques are shown to be effective for bootstrapping a tagger from very small amounts of labeled data: iterative relabeling and retrain...

متن کامل

Automatic Dialog Act Labeling with Minimal Supervision

ABSTRACT: For many natural language applications it is desirable to be able to automatically tag utterances according to their discourse function (dialog act), such as statement, question or acknowledgment. We investigate the problem of automatically tagging dialog acts when handlabeled training data is scarce. The tagging paradigm employed is a hidden Markov model in which dialog acts are stat...

متن کامل

Cascaded model adaptation for dialog act segmentation and tagging

There are many speech and language processing problems which require cascaded classification tasks. While model adaptation has been shown to be useful in isolated speech and language processing tasks, it is not clear what constitutes system adaptation for such complex systems. This paper studies the following questions: In cases where a sequence of classification tasks is employed, how importan...

متن کامل

Analyzing and Predicting Patterns of DAMSL Utterance Tags

We have been annotating TRAINS dialogs with dialog acts in order to produce training data for a dialog act predictor, and to study how language is used in these dialogs. We are using DAMSL dialog acts which consist of 15 independent attributes. For the purposes of this paper, infrequent attributes such as Unintelligible and Self-Talk were set aside to concentrate on the eight major DAMSL tag se...

متن کامل

Automatic Dialog Act Corpus Creation from Web Pages

This work presents two complementary tools dedicated to the task of textual corpus creation for linguistic researches. The chosen application domain is automatic dialog acts recognition, but the proposed tools might also be applied to any other research area that is concerned with dialogs processing. The first software captures relevant dialogs from freely available resources on the World Wide ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006