rater reliability

Interrater reliability in large-scale assessments – Can teachers score national tests reliably without external controls?

2015

Anna Lind

The Norwegian Dependency Treebank

2014

Per Erik Solberg Arne Skjærholt Lilja Øvrelid Kristin Hagen Janne Bondi Johannessen

The Norwegian Dependency Treebank is a new syntactic treebank for Norwegian Bokmål and Nynorsk with manual syntactic and morphological annotation, developed at the National Library of Norway in collaboration with the University of Oslo. It is the first publically available treebank for Norwegian. This paper presents the core principles behind the syntactic annotation and how these principles we...

متن کامل

The multidimensional nature of pathologic vocal quality.

Journal: :The Journal of the Acoustical Society of America 1994

J Kreiman B R Gerratt G S Berke

Although the terms "breathy" and "rough" are frequently applied to pathological voices, widely accepted definitions are not available and the relationship between these qualities is not understood. To investigate these matters, expert listeners judged the dissimilarity of pathological voices with respect to breathiness and roughness. A second group of listeners rated the voices on unidimensiona...

متن کامل

Manual Corpus Annotation: Giving Meaning to the Evaluation Metrics

2012

Yann Mathet Antoine Widlöcher Karën Fort Claire François Olivier Galibert Cyril Grouin Juliette Kahn Sophie Rosset Pierre Zweigenbaum

Computing inter-annotator agreement measures on a manually annotated corpus is necessary to evaluate the reliability of its annotation. However, the interpretation of the obtained results is recognized as highly arbitrary. We describe in this article a method and a tool that we developed which “shuffles” a reference annotation according to different error paradigms, thereby creating artificial ...

متن کامل

Annotations for Opinion Mining Evaluation in the Industrial Context of the DOXA project

2010

Patrick Paroubek Alexander Pak Djamel Mostefa

After presenting opinion and sentiment analysis state of the art and the DOXA project, we review the few evaluation campaigns that have dealt in the past with opinion mining. Then we present the two level opinion and sentiment model that we will use for evaluation in the DOXA project and the annotation interface we use for hand annotating a reference corpus. We then present the corpus which wil...

متن کامل

Smatch: an Evaluation Metric for Semantic Feature Structures

2013

Shu Cai Kevin Knight

The evaluation of whole-sentence semantic structures plays an important role in semantic parsing and large-scale semantic structure annotation. However, there is no widely-used metric to evaluate wholesentence semantic structures. In this paper, we present smatch, a metric that calculates the degree of overlap between two semantic feature structures. We give an efficient algorithm to compute th...

متن کامل

Towards Building Parallel Dependency Treebanks: Intra-Chunk Expansion and Alignment for English Dependency Treebank

2013

Debanka Nandi Maaz Nomani Himanshu Sharma Himani Chaudhary Sambhav Jain Dipti Misra Sharma

The paper presents our work on the annotation of intra-chunk dependencies on an English treebank that was previously annotated with Inter-chunk dependencies, and for which there exists a fully expanded parallel Hindi dependency treebank. This provides fully parsed dependency trees for the English treebank. We also report an analysis of the inter-annotator agreement for this chunk expansion task...

متن کامل

Rorschach Comprehensive System data for 100 nonpatient children from the United States in two age groups.

Journal: :Journal of personality assessment 2007

Mel Hamel Thomas W Shaffer

Building on our previously published study (Hamel, Shaffer, & Erdberg, 2000), which provided data on 100 nonpatient children aged 6 to 12 from the United States, we here provide reference data for two more homogeneous age subgroups: 6 to 9 (N = 50) and 10 to 12 (N = 50). Inclusion criteria are described, and expanded interrater reliability statistics at the response level are presented along wi...

متن کامل

Annotation of WordNet Verbs with TimeML Event Classes

2008

Georgiana Puscasu Verginica Barbu Mititelu

This paper reports on the annotation of all English verbs included in WordNet 2.0 with TimeML event classes. Two annotators assign each verb present in WordNet the most relevant event class capturing most of that verb’s meanings. At the end of the annotation process, inter-annotator agreement is measured using kappa statistics, yielding a kappa value of 0.87. The cases of disagreement between t...

متن کامل

Screening of the spine in adolescents: inter- and intra-rater reliability and measurement error of commonly used clinical tests

2014

Ellen Aartun Anna Degerfalk Linn Kentsdotter Lise Hestbaek

BACKGROUND Evidence on the reliability of clinical tests used for the spinal screening of children and adolescents is currently lacking. The aim of this study was to determine the inter- and intra-rater reliability and measurement error of clinical tests commonly used when screening young spines. METHODS Two experienced chiropractors independently assessed 111 adolescents aged 12-14 years who...

متن کامل