Annotation Curricula to Implicitly Train Non-Expert Annotators

نویسندگان

چکیده

Abstract Annotation studies often require annotators to familiarize themselves with the task, its annotation scheme, and data domain. This can be overwhelming in beginning, mentally taxing, induce errors into resulting annotations; especially citizen science or crowdsourcing scenarios where domain expertise is not required. To alleviate these issues, this work proposes curricula, a novel approach implicitly train annotators. The goal gradually introduce task by ordering instances annotated according learning curriculum. do so, formalizes curricula for sentence- paragraph-level tasks, defines an strategy, identifies well-performing heuristics interactively trained models on three existing English datasets. Finally, we provide proof of concept carefully designed user study 40 voluntary participants who are asked identify most fitting misconception tweets about Covid-19 pandemic. results indicate that using simple heuristic order already significantly reduce total time while preserving high quality. thus promising research direction improve collection. facilitate future research—for instance, adapt specific tasks expert scenarios—all code from consisting 2,400 annotations made available.1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Creating a linguistic plausibility dataset with non-expert annotators

We describe the creation of a linguistic plausibility dataset that contains annotated examples of language judged to be linguistically plausible, implausible, and every-thing in between. To create the dataset we randomly generate sentences and have them annotated by crowd sourcing over the Amazon Mechanical Turk. Obtaining inter-annotator agreement is a difficult problem because linguistic plau...

متن کامل

Annotation: neurofeedback - train your brain to train behaviour.

BACKGROUND Neurofeedback (NF) is a form of behavioural training aimed at developing skills for self-regulation of brain activity. Within the past decade, several NF studies have been published that tend to overcome the methodological shortcomings of earlier studies. This annotation describes the methodical basis of NF and reviews the evidence base for its clinical efficacy and effectiveness in ...

متن کامل

Design and Evaluation of Shared Prosodic Annotation for Spontaneous French Speech: From Expert Knowledge to Non-Expert Annotation

In the area of large French speech corpora, there is a demonstrated need for a common prosodic notation system allowing for easy data exchange, comparison, and automatic annotation. The major questions are: (1) how to develop a single simple scheme of prosodic transcription which could form the basis of guidelines for non-expert manual annotation (NEMA), used for linguistic teaching and researc...

متن کامل

Word Sense Annotation of Polysemous Words by Multiple Annotators

We describe results of a word sense annotation task using WordNet, involving half a dozen well-trained annotators on ten polysemous words for three parts of speech. One hundred sentences for each word were annotated. Annotators had the same level of training and experience, but interannotator agreement (IA) varied across words. There was some effect of part of speech, with higher agreement on n...

متن کامل

Evaluating Dialogue Act Tagging with Naive and Expert Annotators

In this paper the dialogue act annotation of naive and expert annotators, both annotating the same data, are compared in order to characterise the insights annotations made by different kind of annotators may provide for evaluating dialogue act tagsets. It is argued that the agreement among naive annotators provides insight in the clarity of the tagset, whereas agreement among expert annotators...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computational Linguistics

سال: 2022

ISSN: ['1530-9312', '0891-2017']

DOI: https://doi.org/10.1162/coli_a_00436