Analytic Scoring of TOEFL® CBT Essays: Scores From Humans and E-rater
نویسندگان
چکیده
The main purpose of the study was to investigate the distinctness and reliability of analytic (or multitrait) rating dimensions and their relationships to holistic scores and e-rater essay feature variables in the context of the TOEFL computer-based test (CBT) writing assessment. Data analyzed in the study were analytic and holistic essay scores provided by human raters and essay feature variable scores computed by e-rater (version 2.0) for two TOEFL CBT writing prompts. It was found that (a) all of the six analytic scores were not only correlated among themselves but also correlated with the holistic scores, (b) high correlations obtained among holistic and analytic scores were largely attributable to the impact of essay length on both analytic and holistic scoring, (c) there may be some potential for profile scoring based on analytic scores, and (d) some strong associations were confirmed between several e-rater variables and analytic ratings. Implications are discussed for improving the analytic scoring of essays, validating automated scores, and refining e-rater essay feature variables.
منابع مشابه
Construct Validity of e-rater® in Scoring TOEFL® Essays
This study examined the construct validity of the e-rater automated essay scoring engine as an alternative to human scoring in the context of TOEFL essay writing. Analyses were based on a sample of students who repeated the TOEFL within a short time period. Two e-rater scores were investigated in this study, the first based on optimally predicting the human essay score and the second based on e...
متن کاملA Differential Word Use Measure for Content Analysis in Automated Essay Scoring
As part of its nonprofit mission, ETS conducts and disseminates the results of research to advance quality and equity in education and assessment for the benefit of ETS's constituents and the field. To obtain a PDF or a print copy of a report, please visit: Abstract This paper proposes an alternative content measure for essay scoring, based on the difference in the relative frequency of a word ...
متن کاملAutomated Evaluation of Essays and Short Answers
Essay questions designed to measure writing ability, along with open-ended questions requiring short answers, are highly-valued components of effective assessment programs, but the expense and logistics of scoring them reliably often present a barrier to their use. Extensive research and development efforts at Educational Testing Service (ETS) over the past several years (see http://www.ets.org...
متن کاملStumping e-rater: challenging the validity of automated essay scoring
This report presents the findings of a research project funded by and carried out under the auspices of the Graduate Record Examinations Board Researchers are encouraged to express freely their professional judgment. Therefore, points of view or opinions stated in Graduate Record Examinations Board Reports do not necessarily represent official Graduate Record Examinations Board position or poli...
متن کاملEnriching Automated Essay Scoring Using Discourse Marking
Electronic Essay Rater (e-rater) is a prototype automated essay scoring system built at Educational Testing Service (ETS) that uses discourse marking, in addition to syntactic information and topical content vector analyses to automatically assign essay scores. This paper gives a general description ore-rater as a whole, but its emphasis is on the importance of discourse marking and argument pa...
متن کامل