Inter-rater Agreement Measures and the Refinement of Metrics in the PLATO MT Evaluation Paradigm

نویسندگان

Keith J. MILLER

Michelle VANNI

چکیده

The PLATO machine translation (MT) evaluation (MTE) research program has as a goal the systematic development of a predictive relationship between discrete, welldefined MTE metrics and the specific information processing tasks that can be reliably performed with output. Traditional measures of quality, informed by the International Standards for Language Engineering (ISLE), namely, clarity, coherence, morphology, syntax, general and domain-specific lexical robustness, and named-entity translation, as well as a DARPAinspired measure of adequacy are its core. For robust validation, indispensable for refinement of tests and guidelines, we measure inter-rater reliability on the assessments. Here we report on our results, focusing on the PLATO Clarity and Coherence assessments, and we discuss our method for iteratively refining both the linguistic metrics and the guidelines for applying them within the PLATO evaluation paradigm. Finally, we discuss reasons why kappa might not be the best measure of interrater agreement for our purposes, and suggest directions for future investigation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Test-Retest and Inter-Rater Reliability Study of the Schedule for Oral-Motor Assessment in Persian Children

Objectives: Reliable and valid clinical tools to screen, diagnose, and describe eating functions and dysphagia in children are highly warranted. Today most specialists are aware of the role of assessment scales in the treatment of affected individuals. However, the problem is that the clinical tools used might be nonstandard, and worldwide, there is no integrated assessment performed to assess ...

متن کامل

Functional Movement Screen in Elite Boy Basketball Players: A Reliability Study

Purpose: To investigate the reliability of Functional Movement Screen (FMS) in basketball players. A few studies have compared the reliability of FMS between raters with different experience in athletes. The purpose of this study was to compare the FMS scoring between the beginners and expert raters using video records.  Methods: This is a cross-sectional study. The study subjects compris...

متن کامل

Nurse-Physician Agreement on Triage Category: A Reliability Analysis of Emergency Severity Index

Background and Objectives: MThe Emergency Severity Index (ESI) triage is commonly used in clinical settings to determine the patients’ emergency severity. However, the reliability of this index is not sufficiently explored. The present study examines the inter-rater reliability of ESI by comparing triage ratings as performed by nurses and physicians. Methods: This prospective cross-sectional st...

متن کامل

Formal v. Informal: Register-Differentiated Arabic MT Evaluation in the PLATO Paradigm

Tasks performed on machine translation (MT) output are associated with input text types such as genre and topic. Predictive Linguistic Assessments of Translation Output, or PLATO, MT Evaluation (MTE) explores a predictive relationship between linguistic metrics and the information processing tasks reliably performable on output. PLATO assigns a linguistic signature, which cuts across the task-b...

متن کامل

Quiz-Based Evaluation of Machine Translation

This paper proposes a newmethod of manual evaluation for statistical machine translation, the so-called quiz-based evaluation, estimating whether people are able to extract information from machine-translated texts reliably. We apply the method to two commercial and two experimental MT systems that participated in WMT 2010 in English-to-Czech translation. We report inter-annotator agreement for...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Inter-rater Agreement Measures and the Refinement of Metrics in the PLATO MT Evaluation Paradigm

نویسندگان

چکیده

منابع مشابه

Test-Retest and Inter-Rater Reliability Study of the Schedule for Oral-Motor Assessment in Persian Children

Functional Movement Screen in Elite Boy Basketball Players: A Reliability Study

Nurse-Physician Agreement on Triage Category: A Reliability Analysis of Emergency Severity Index

Formal v. Informal: Register-Differentiated Arabic MT Evaluation in the PLATO Paradigm

Quiz-Based Evaluation of Machine Translation

عنوان ژورنال:

اشتراک گذاری