Standoff Coordination for Multi-Tool Annotation in a Dialogue Corpus

نویسندگان

  • Kepa Joseba Rodríguez
  • Stefanie Dipper
  • Michael Götze
  • Massimo Poesio
  • Giuseppe Riccardi
  • Christian Raymond
  • Joanna Rabiega-Wisniewska
چکیده

The LUNA corpus is a multi-lingual, multidomain spoken dialogue corpus currently under development that will be used to develop a robust natural spoken language understanding toolkit for multilingual dialogue services. The LUNA corpus will be annotated at multiple levels to include annotations of syntactic, semantic, and discourse information; specialized annotation tools will be used for the annotation at each of these levels. In order to synchronize these multiple layers of annotation, the PAULA standoff exchange format will be used. In this paper, we present the corpus and its PAULA-based architecture.1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MMAX: A Tool for the Annotation of Multi-modal Corpora

We present a tool for the annotation of XMLencoded multi-modal language corpora. Nonhierarchical data is supported by means of standoff annotation. We define base level and suprabase level elements and theory-independent markables for multi-modal annotation and apply them to a cospecification annotation scheme. We also describe how arbitrary annotation schemes can be represented in terms of the...

متن کامل

MUP - The UIC Standoff Markup Tool

Recently developed markup tools for dialogue work are quite sophisticated and require considerable knowledge and overhead, but older tools do not support XML standoff markup, the current annotation style of choice. For the DIAG-NLP project we have created a “lightweight” but modern markup tool that can be configured and used by the working NLP researcher.

متن کامل

EAGLE: an Error-Annotated Corpus of Beginning Learner German

This paper describes the Error-Annotated German Learner Corpus (EAGLE), a corpus of beginning learner German with grammatical error annotation. The corpus contains online workbook and and hand-written essay data from learners in introductory German courses at The Ohio State University. We introduce an error typology developed for beginning learners of German that focuses on linguistic propertie...

متن کامل

Iula2Standoff: a tool for creating standoff documents for the IULACT

Due to the increase in the number and depth of analyses required over the text, like entity recognition, POS tagging, syntactic analysis, etc. the annotation in-line has become unpractical. In Natural Language Processing (NLP) some emphasis has been placed in finding an annotation method to solve this problem. A possibility is the standoff annotation. With this annotation style it is possible t...

متن کامل

A corpus for studying addressing behavior in multi-party dialogues

This paper describes a multi-modal corpus of hand-annotated meeting dialogues that was designed for studying addressing behavior in face-to-face conversations. The corpus contains annotated dialogue acts, addressees, adjacency pairs and gaze direction. First, we describe the corpus design where we present the annotation schema, annotation tools and annotation process itself. Then, we analyze th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007