Contrasting the Interaction Structure of an Email and a Telephone Corpus: A Machine Learning Approach to Annotation of Dialogue Function Units
نویسندگان
چکیده
We present a dialogue annotation scheme for both spoken and written interaction, and use it in a telephone transaction corpus and an email corpus. We train classifiers, comparing regular SVM and structured SVM against a heuristic baseline. We provide a novel application of structured SVM to predicting relations between instance pairs.
منابع مشابه
Collaborative Annotation of Dialogue Acts: Application of a New ISO Standard to the Switchboard Corpus
This article reports some initial results from the collaborative work on converting SWBD-DAMSL annotation scheme used in the Switchboard Dialogue Act Corpus to ISO DA annotation framework, as part of our on-going research on the interoperability of standardized linguistic annotations. A qualitative assessment of the conversion between the two annotation schemes was performed to verify the appli...
متن کاملHidden Softmax Sequence Model for Dialogue Structure Analysis
We propose a new unsupervised learning model, hidden softmax sequence model (HSSM), based on Boltzmann machine for dialogue structure analysis. The model employs three types of units in the hidden layer to discovery dialogue latent structures: softmax units which represent latent states of utterances; binary units which represent latent topics specified by dialogues; and a binary unit that repr...
متن کاملActive Learning for Dialogue Act Classification
Active learning techniques were employed for classification of dialogue acts over two dialogue corpora, the English humanhuman Switchboard corpus and the Spanish human-machine Dihana corpus. It is shown clearly that active learning improves on a baseline obtained through a passive learning approach to tagging the same data sets. An error reduction of 7% was obtained on Switchboard, while a fact...
متن کاملThe Effect of CMC in Business Emails in Lingua Franca: Discourse Features and Misunderstandings
The paper argues that everyday exchange of business emails produces a development in the work-group relationship, which, in turn, makes new communication styles possible and acceptable by the users' habit to computer-mediated forms, even in unbalanced professional exchanges. The focus is on the (spoken) discourse features of email messages in a self-compiled corpus of selected computer-mediated...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کامل