inter rater reliability

A review and update of the Health of the Nation Outcome Scales (HoNOS).

Journal: :BJPsych bulletin 2018

Mick James Jon Painter Bill Buckingham Malcolm W Stewart

Aims and method The Health of the Nation Outcome Scales (HoNOS) and its older adults' version (HoNOS 65+) have been used widely for 20 years, but their glossaries have not been revised to reflect clinicians' experiences or changes in service delivery. The Royal College of Psychiatrists convened an international advisory board, with UK, Australian and New Zealand expertise, to identify desirable...

متن کامل

Collecting Reliable Human Judgements on Machine-Generated Language: The Case of the QG-STEC Data

2016

Keith Godwin Paul Piwek

Question generation (QG) is the problem of automatically generating questions from inputs such as declarative sentences. The Shared Evaluation Task Challenge (QG-STEC) Task B that took place in 2010 evaluated several state-of-the-art QG systems. However, analysis of the evaluation results was affected by low inter-rater reliability. We adapted Nonaka & Takeuchi’s knowledge creation cycle to the...

متن کامل

Metrical Annotation of a Large Corpus of Spanish Sonnets: Representation, Scansion and Evaluation

2016

Borja Navarro-Colorado María Ribes-Lafoz Noelia Sánchez

In order to analyze metrical and semantics aspects of poetry in Spanish with computational techniques, we have developed a large corpus annotated with metrical information. In this paper we will present and discuss the development of this corpus: the formal representation of metrical patterns, the semi-automatic annotation process based on a new automatic scansion system, the main annotation pr...

متن کامل

Annotating dropped pronouns in Chinese newswire text

2012

Elizabeth Baran Yaqin Yang Nianwen Xue

We propose an annotation framework to explicitly identify dropped subject pronouns in Chinese. We acknowledge and specify 10 concrete pronouns that exist as words in Chinese and 4 abstract pronouns that do not correspond to Chinese words, but that are recognized conceptually, to native Chinese speakers. These abstract pronouns are identified as “unspecified”, “pleonastic”, “event”, and “existen...

متن کامل

360’ Ratings: an Analysis of Assumptions and a Research Agenda for Evaluating Their Validity

2001

Walter C. Borman

This article argues that assumptions surrounding 360” ratings should be examined; most notably, the assumptions that different rating sources have relatively unique perspectives on performance and multiple rating sources provide incremental validity over the individual sources. Studies generally support the first assumption, although reasons for interrater disagreement across different organiza...

متن کامل

Interrater reliability in large-scale assessments – Can teachers score national tests reliably without external controls?

2015

Anna Lind

متن کامل

Extending effect annotation with lexical decomposition

2015

Josef Ruppenhofer Jasper Brandes

In this contribution, we report on an effort to annotate German data with information relevant to opinion inference. Such information has previously been referred to as effect or couched in terms of eventevaluation functors. We extend the theory and present an extensive scheme that combines both approaches and thus extends the set of inference-relevant predicates. Using these guidelines to anno...

متن کامل

Normative and Distinctive Situational Accuracy 1 Running Head: NORMATIVE AND DISTINCTIVE SITUATION PERCEPTION ACCURACY Normative and Distinctive Accuracy in Situation Perceptions: Magnitude and Personality Correlates

2016

John F. Rauthmann Ryne A. Sherman

To what extent do people achieve accuracy in judging others’ situations? Based on interpersonal perception models, we propose that ex-situ raters may attain accuracy by judging the psychological characteristics of a situation that in-situ raters have experienced according to a normative and distinctive characteristics profile. Biesanz’ Social Accuracy Model (SAM) provides a flexible crossed-eff...

متن کامل

Agreement, the f-measure, and reliability in information retrieval.

Journal: :Journal of the American Medical Informatics Association : JAMIA 2005

George Hripcsak Adam S Rothschild

Information retrieval studies that involve searching the Internet or marking phrases usually lack a well-defined number of negative cases. This prevents the use of traditional interrater reliability metrics like the kappa statistic to assess the quality of expert-generated gold standards. Such studies often quantify system performance as precision, recall, and F-measure, or as agreement. It can...

متن کامل

Reliability, responsiveness, and validity of the Kansas University Standing Balance Scale.

Journal: :Journal of geriatric physical therapy 2006

Patricia Kluding Bonnie Swafford Perri Cagle Byron Gajewski

PURPOSE The purpose of this research was to investigate the reliability, responsiveness, and concurrent validity of the Kansas University Standing Balance Scale (KUSBS). METHODS For the reliability study, the KUSBS was used twice on 2 separate days with 23 inpatient rehabilitation patients. To assess responsiveness and concurrent validity, a retrospective chart review of 25 patients was perfo...

متن کامل