Automated Assessment of Paragraph Quality: Introduction, Body, and Conclusion Paragraphs
نویسندگان
چکیده
Natural language processing and statistical methods were used to identify linguistic features associated with the quality of student-generated paragraphs. Linguistic features were assessed using Coh-Metrix. The resulting computational models demonstrated small to medium effect sizes for predicting paragraph quality: introduction quality r2 = .25, body quality r2 = .10, and conclusion quality r2 = .11. Although the variance explained was somewhat low, the linguistic features identified were consistent with the rhetorical goals of paragraph types. Avenues for bolstering this approach by considering individual writing styles and techniques are considered. Writing Practice and Assessment Effective writing is a critical skill related to academic and professional success (Geiser & Studley, 2001; Kellogg & Raulerson, 2007), yet large-scale assessments often show that writing proficiency is elusive for many students (National Commission on Writing, NCW, 2003). Strategy instruction, writing practice, and individualized feedback are needed to improve students’ writing skills (Graham & Perin, 2007; Kellogg & Raulerson, 2007). Students must be taught strategies for enacting the writing process – prewriting, drafting, and revision – along with the knowledge needed to employ the strategies. Students must also have opportunities to practice these developing strategies and receive timely and individualized feedback throughout the learning process. Practice and feedback are key for students to reflect on their writing and understand how their use of writing strategies impacts writing quality. Although highly effective, strategy instruction with ample practice and feedback requires significant time and effort. Classroom instructors are constrained in their ability to give personal and detailed feedback on student writing as a result of available instructional time, increasing class sizes, and a focus on standardized tests (NCW, 2003). Automated Essay Scoring Automated essay scoring (AES) – the use of computers to grade student essays – allows students to practice writing and receive feedback, without adding to teachers’ burdens (Dikli, 2006). Writing can be assessed via combinations of statistical modeling, natural language processing (NLP), Latent Semantic Analysis (LSA), artificial intelligence (AI) and machine learning, and other methods. Systems such as e-rater (Burstein, Chodorow, & Leacock, 2004) and IntelliMetric (Rudner, Garcia, & Welch, 2006) rely primarily on NLP and AI. First, a corpus of essays is annotated to identify target essay elements (e.g., topic sentences). Essays are then automatically analyzed along many linguistic dimensions, and statistical analyses extract features that discriminate between higher and lower-quality essays. Finally, weighted statistical models combine the extracted linguistic properties into algorithms that assign grades to student essays. The Intelligent Essay Assessor (IEA, Landauer, Laham, & Foltz, 2003) uses LSA to assess essays. LSA assumes that word meanings are often determined by their cooccurrence with other words. Texts are represented in a word-by-context matrix. Context refers to sentences, paragraphs, or whole texts. Singular value decomposition reduces the number of dimensions to capture semantic structure. Using LSA, student essays are compared to a benchmark corpus of pre-scored essays to assess semantic similarity. Essay scores are based on the overlap between student essays and the benchmarks. LSA does not require annotation, model-building, human ratings, or syntactic parsing; essentially, the benchmark corpus is the model. AES systems have successfully increased opportunities for student writing with feedback. Researchers also report positive correlations and high percent agreement with human raters (Dikli, 2006). Two main objections to AES are that it lacks humanist sensitivity and detection is limited by available algorithms (Hearst, 2002). Automated essay scorers, and their reliance on statistical regularities, may not capture writers’ style, voice, or other individual expressive differences. Thus, despite progress, automated scoring systems are still under development, with opportunities to expand in many areas. Assessing Paragraph Quality In this project, we contribute to AES research by assessing the quality of canonical components of the five-paragraph essay: introduction, body, and conclusion paragraphs (Albertson, 2007; Johnson, Smagorinksy, Thompson & Fry, 2007). In five-paragraph essays, students first state their thesis and arguments in an engaging introduction. Subsequently, each argument forms the topic sentence of a body paragraph, in which evidence is offered to support that claim. Finally, the author’s thesis and claims are summarized in a conclusion paragraph that demonstrates the unity and significance of ideas. Detractors have argued that the five-paragraph essay stifles creativity and leads to formulaic writing (Albertson, 2007; Dean, 2000). However, for new and struggling writers, the structure can provide an objective schema for organizing and communicating one’s ideas. Moreover, for better or worse, the five-paragraph essay is an important aspect of standardized testing, such as the SAT Reasoning Test (SAT). Prior research has sought to automatically detect introduction, body, and conclusion paragraph types by combining linguistic and LSA methods (Crossley, Dempsey, & McNamara, under review) using Coh-Metrix (Graesser et al., 2004; McNamara, Crossley, & McCarthy, 2010). In the Crossley et al. study, initial paragraphs (versus middle and final paragraphs) were shorter, contained less word overlap, and fewer positive logical connectives (e.g., also, then), and contained more specific, meaningful, and imageable words. The directness of these paragraphs, combined with evocative and meaningful word choices, was consistent with introduction paragraph goals: concisely stating one’s position and arguments in a way that grabbed the reader’s attention. Middle paragraphs were longer, contained more given information (maintained a common thread of ideas), and less imageable and familiar words. The greater length and consistency of these paragraphs might have been necessary for the development of evidence and examples to support a single, coherent topic sentence. In addition, the use of less imageable or familiar words may have indicated the authors’ elaboration upon specific or abstract principles. Lastly, final paragraphs were shorter, and used words that were less meaningful and specific, but more familiar. Conclusions also displayed less given information, more content word overlap, and more positive logical connectives. These linguistic features were consistent with the rhetorical goal of providing a concise and accessible summary of one’s position and arguments without adding new evidence or examples. These paragraph features were used to develop a model capable of detecting paragraph type in a corpus of student writing. The reported model performed well above chance and the accuracy of the model (65%) was nearly identical to the accuracy of human judges (66%). Overall, these results suggested that meaningful properties of introduction, body, and conclusion paragraphs could be detected through automated assessment methods. An important question is how such properties relate to paragraph quality. Crossley et al. (under review) reported post-hoc analyses showing an interaction between detection accuracy and quality. Paragraphs that were rated more highly by humans were easier to classify, by both humans and the model, than were poorly-rated paragraphs. Higher quality paragraphs may have been more likely to enact appropriate rhetorical forms (e.g., stating a clear thesis in the introduction), which aided detection. The remainder of this paper reports linguistic analyses and the development of model to assess paragraph quality more directly. Our goal is to examine if there are linguistic properties that can be used to discriminate between wellwritten versus poorly-written introduction, body, or conclusion paragraphs.
منابع مشابه
Classifying paragraph types using linguistic features: Is paragraph positioning important?
This study examines the potential for computational tools and human raters to classify paragraphs based on positioning. In this study, a corpus of 182 paragraphs was collected from student, argumentative essays. The paragraphs selected were initial, middle, and final paragraphs and their positioning related to introductory, body, and concluding paragraphs. The paragraphs were analyzed by the co...
متن کاملLanguage Complexity, Accuracy and Fluency in Different Types of Writing Paragraph: Do the Raters Notice Such Effect
The aim of the present study was to investigate the effects of two types of paragraph on EFL learners’ written production. It addressed the issue of how three aspects of language production (i.e. complexity, accuracy, and fluency) vary among two types of paragraphs (i.e. paragraphs of chronology and cause-effect) written by EFL learners. Thirty intermediate level learners of English participate...
متن کاملInvestigating the Effect of Self-, Peer-, and Teacher Assessment in Second Language Writing over Time: A Multifaceted Rasch Approach
This study investigated the accuracy of scores assigned by self-, peer-, and teacher assessors over time. Thirty-three English majors who were taking paragraph development course at Vali-e-Asr University of Rafsanjan and two instructors who had been teaching essay writing for at least two years at university, participated in the study. After receiving instructions on paragraph development, part...
متن کاملThe impact of portfolios and conferencing on Iranian EFL learners’ writing skill
The main purpose of this study was to investigate the impact of portfolios and conferencing techniques on Iranian EFL learners' writing skill. The experiment involved Iranian intermediate students that were randomly assigned to two experimental groups and one control group. The participants of the first experimental group were asked to provide portfolios of their 4 paragraphs during the course ...
متن کاملUsing Coh-Metrix to Analyze Chinese ESL Learners’ Writing
Scoring essays is costly, laborious and time-consuming. Automated scoring of essays is a promising approach to face this challenge. Coh-Metrix is a computer tool that reports on cohesion, sentence complexity, lexical sophistication and other descriptive features at sentenceand paragraph-level. It has been widely used to analyze native English speakers’ essay writing. However, few studies have u...
متن کامل