Beyond Majority Voting: Generating Evaluation Scales using Item Response Theory
نویسندگان
چکیده
We introduce Item Response Theory (IRT) from psychometrics as an alternative to majority voting to create an IRT gold standard (GSIRT ). IRT describes characteristics of individual items in GSIRT their difficulty and discriminating power and is able to account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we evaluated IRT’s model-fitting of a majority vote gold standard designed for Recognizing Textual Entailment (RTE), denoted as GSRTE . By collecting human responses and fitting our IRT model, we found that up to 31% of GSRTE were not useful in building GSIRT for RTE. In addition, we found low inter-annotator agreement for some items in GSRTE suggesting that more work is needed for creating intelligent gold-standards.
منابع مشابه
Psychometric Properties of State Level Subjective Vitality Scale based on classical test theory and Item-response theory
The purpose of the present study was to investigate the factor structure and Item-Response parameters of State Level of Subjective Vitality Scale. The research design was correlational, and the statistical population consisted of students of the Shahid Beheshti University of Tehran. Sample group including 240 students were selected through multi-stage sampling and completed Subjective Vitality ...
متن کاملGrowth Scales as an Alternative to Vertical Scales - Practical Assessment, Research & Evaluation
Student growth models depend on comparing assessments of individual students over time. Vertical scales (c.f. Kolen and Brennan, 2004) are among several options that exist for development of scales that allow these comparisons. Briefly, vertical scales are created through administering an embedded subset of items to different students at two educational levels, typically one year apart, and lin...
متن کاملEvaluation Psychometric Characteristics of the Persian Version of the Colorado Learning Attitudes about Science Survey Using polytomous Item Response Model
Goal: Researchers in the field of science education believe that peoplechr(chr('39')39chr('39'))s attitudes about learning will have a significant impact on their future learning and what they learn from science will not be unrelated to their views and attitudes. Accordingly, most questionnaires have been developed to measure attitudes toward science, especially about physics learning attitudes...
متن کاملSample Size Requirements for Estimation of Item Parameters in the Multidimensional Graded Response Model
Likert types of rating scales in which a respondent chooses a response from an ordered set of response options are used to measure a wide variety of psychological, educational, and medical outcome variables. The most appropriate item response theory model for analyzing and scoring these instruments when they provide scores on multiple scales is the multidimensional graded response model (MGRM) ...
متن کاملMeasurement of teen dating violence attitudes: an item response theory evaluation of differential item functioning according to gender.
Accurate assessment of attitudes about intimate partner violence is important for evaluation of prevention and early intervention programs. Assessment of attitudes about cross-gender interactions is particularly susceptible to bias because it requires specifying the gender of the perpetrator and the victim. As it is likely that respondents will tend to identify with the same-gender actor, items...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1605.08889 شماره
صفحات -
تاریخ انتشار 2016