Issues in the addition of ISO standard annotations to the Switchboard corpus
نویسندگان
چکیده
This paper analyzes the issues that arise when trying to add annotations to the dialogues in the Switchboard corpus according to ISO standard 24617-2, exploiting the existing SWBD-DAMSL annotations. These issues relate to differences between the two tag sets; to the highly multidimensional view that underlies the ISO standard; to differences in segmenting the dialogues into functional units; to the use of in-line markups for certain phenomena in Switchboard, and to the use of intra-dialogue dependence relations as defined in the ISO standard. The analysis is supplemented by a discussion of how the existing annotations may be helpful to semi-automatically create a fullyfledged ISO standard annotation alongside the existing SWBD-DAMSL annotation.
منابع مشابه
Collaborative Annotation of Dialogue Acts: Application of a New ISO Standard to the Switchboard Corpus
This article reports some initial results from the collaborative work on converting SWBD-DAMSL annotation scheme used in the Switchboard Dialogue Act Corpus to ISO DA annotation framework, as part of our on-going research on the interoperability of standardized linguistic annotations. A qualitative assessment of the conversion between the two annotation schemes was performed to verify the appli...
متن کاملMany Uses, Many Annotations for Large Speech Corpora: Switchboard and TDT as Case Studies
This paper discusses the challenges that arise when large speech corpora receive an ever-broadening range of diverse and distinct annotations. Two case studies of this process are presented: the Switchboard Corpus of telephone conversations and the TDT2 corpus of broadcast news. Switchboard has undergone two independent transcriptions and various types of additional annotation, all carried out ...
متن کامل70 24 v 1 1 3 Ju l 2 00 0 Many Uses , Many Annotations for Large Speech Corpora : Switchboard and TDT as Case Studies
This paper discusses the challenges that arise when large speech corpora receive an ever-broadening range of diverse and distinct annotations. Two case studies of this process are presented: the Switchboard Corpus of telephone conversations and the TDT2 corpus of broadcast news. Switchboard has undergone two independent transcriptions and various types of additional annotation, all carried out ...
متن کاملThe NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue
This paper describes a recently completed common resource for the study of spoken discourse, the NXT-format Switchboard Corpus. Switchboard is a long-standing corpus of telephone conversations (Godfrey et al., 1992). We have brought together transcriptions with existing annotations for syntax, disfluency, speech acts, animacy, information status, coreference, and prosody; along with substantial...
متن کاملThe DialogBank
This paper presents the DialogBank, a new language resource consisting of dialogues with gold standard annotations according to the ISO 24617-2 standard. Some of these dialogues have been taken from existing corpora and have been re-annotated according to the ISO standard; others have been annotated directly according to the standard. The ISO 24617-2 annotations have been designed according to ...
متن کامل