inter rater reliability

Can training improve the quality of inferences made by raters in competency modeling? A quasi-experiment.

Journal: :The Journal of applied psychology 2007

Filip Lievens Juan I Sanchez

A quasi-experiment was conducted to investigate the effects of frame-of-reference training on the quality of competency modeling ratings made by consultants. Human resources consultants from a large consulting firm were randomly assigned to either a training or a control condition. The discriminant validity, interrater reliability, and accuracy of the competency ratings were significantly highe...

متن کامل

Automatic large-scale oral language proficiency assessment

2007

Febe de Wet Christa van der Walt Thomas Niesler

We describe first results obtained during the development of an automatic system for the assessment of spoken English proficiency of university students. The ultimate aim of this system is to allow fast, consistent and objective assessment of oral proficiency for the purpose of placing students in courses appropriate to their language skills. Rate of speech (ROS) was chosen as an indicator of f...

متن کامل

ONYX: A System for the Semantic Analysis of Clinical Text

2009

Lee M. Christensen Henk Harkema Peter J. Haug Jeannie Yuhaniak Irwin Wendy W. Chapman

This paper introduces ONYX, a sentencelevel text analyzer that implements a number of innovative ideas in syntactic and semantic analysis. ONYX is being developed as part of a project that seeks to translate spoken dental examinations directly into chartable findings. ONYX integrates syntax and semantics to a high degree. It interprets sentences using a combination of probabilistic classifiers,...

متن کامل

Towards the Orwellian Nightmare: Separation of Business and Personal Emails

2006

Sanaz Jabbari Ben Allison David Guthrie Louise Guthrie

This paper describes the largest scale annotation project involving the Enron email corpus to date. Over 12,500 emails were classified, by humans, into the categories “Business” and “Personal”, and then subcategorised by type within these categories. The paper quantifies how well humans perform on this task (evaluated by inter-annotator agreement). It presents the problems experienced with the ...

متن کامل

Universidade de aveiro's voice evaluation protocol

2009

Luis M. T. Jesus Anna Barney Ricardo Santos Janine Caetano Juliana Jorge Pedro Sá-Couto

This paper presents Universidade de Aveiro’s Voice Evaluation Protocol for European Portuguese (EP), and a preliminary inter-rater reliability study. Ten patients with vocal pathology were assessed, by two Speech and Language Therapists (SLTs). Protocol parameters such as overall severity, roughness, breathiness, change of loudness (CAPEV), grade, breathiness and strain (GRBAS), glottal attack,...

متن کامل

The BECauSE Corpus 2.0: Annotating Causality and Overlapping Relations

2017

Jesse Dunietz Lori S. Levin Jaime G. Carbonell

Language of cause and effect captures an essential component of the semantics of a text. However, causal language is also intertwined with other semantic relations, such as temporal precedence and correlation. This makes it difficult to determine when causation is the primary intended meaning. This paper presents BECauSE 2.0, a new version of the BECauSE corpus with exhaustively annotated expre...

متن کامل

Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure

2016

Oded Avraham Yoav Goldberg

We suggest a new method for creating and using gold-standard datasets for word similarity evaluation. Our goal is to improve the reliability of the evaluation, and we do this by redesigning the annotation task to achieve higher inter-rater agreement, and by defining a performance measure which takes the reliability of each annotation decision in the dataset into account.

متن کامل

Concept mapping assessment in a problem-based medical curriculum.

Journal: :Medical teacher 2010

Salah Eldin Kassab Shereen Hussain

BACKGROUND In the problem-based learning (PBL) medical curriculum at the Arabian Gulf University in Bahrain, students construct concept maps related to each case they study in PBL tutorials. AIM To evaluate the interrater reliability and predictive validity of concept map scores using a structured assessment tool. METHODS We examined concept maps of the same cohort of students at the beginn...

متن کامل

A Semantically Compositional Annotation Scheme for Time Normalization

2016

Steven Bethard Jonathan Parker

We present a new annotation scheme for normalizing time expressions, such as three days ago, to computer-readable forms, such as 2016-03-07. The annotation scheme addresses several weaknesses of the existing TimeML standard, allowing the representation of time expressions that align to more than one calendar unit (e.g., the past three summers), that are defined relative to events (e.g., three w...

متن کامل

Assessing dimensions of competency to stand trial: construct validation of the ECST-R.

Journal: :Assessment 2003

Richard Rogers Rebecca L Jackson Kenneth W Sewell Chad E Tillbrook Mary A Martin

Four decades of forensic research have left unanswered a fundamental issue regarding the best conceptualization of competency to stand trial vis-à-vis the Dusky standard. The current study investigated three competing models (discrete abilities, domains, and cognitive complexity) on combined data (N = 411) from six forensic and correctional samples. Using the Evaluation of Competency to Stand T...

متن کامل