Surprisal as a Predictor of Essay Quality
نویسندگان
چکیده
Modern automated essay scoring systems rely on identifying linguistically-relevant features to estimate essay quality. This paper attempts to bridge work in psycholinguistics and natural language processing by proposing sentence processing complexity as a feature for automated essay scoring, in the context of English as a Foreign Language (EFL). To quantify processing complexity we used a psycholinguistic model called surprisal theory. First, we investigated whether essays’ average surprisal values decrease with EFL training. Preliminary results seem to support this idea. Second, we investigated whether surprisal can be effective as a predictor of essay quality. The results indicate an inverse correlation between surprisal and essay scores. Overall, the results are promising and warrant further investigation on the usability of surprisal for essay scoring.
منابع مشابه
Lexical surprisal as a general predictor of reading time
Probabilistic accounts of language processing can be psychologically tested by comparing word-reading times (RT) to the conditional word probabilities estimated by language models. Using surprisal as a linking function, a significant correlation between unlexicalized surprisal and RT has been reported (e.g., Demberg and Keller, 2008), but success using lexicalized models has been limited. In th...
متن کاملWord surprisal predicts N400 amplitude during reading
We investigated the effect of word surprisal on the EEG signal during sentence reading. On each word of 205 experimental sentences, surprisal was estimated by three types of language model: Markov models, probabilistic phrasestructure grammars, and recurrent neural networks. Four event-related potential components were extracted from the EEG of 24 readers of the same sentences. Surprisal estima...
متن کاملParsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus
The surprisal of a word on a probabilistic grammar constitutes a promising complexity metric for human sentence comprehension difficulty. Using two different grammar types, surprisal is shown to have an effect on fixation durations and regression probabilities in a sample of German readers’ eye movements, the Potsdam Sentence Corpus. A linear mixed-effects model was used to quantify the effect ...
متن کاملPredictive power of word surprisal for reading times is a linear function of language model quality
Within human sentence processing, it is known that there are large effects of a word’s probability in context on how long it takes to read it. This relationship has been quantified using informationtheoretic surprisal, or the amount of new information conveyed by a word. Here, we compare surprisals derived from a collection of language models derived from n-grams, neural networks, and a combina...
متن کاملData from eye-tracking corpora as evidence for theories of syntactic processing complexity.
We evaluate the predictions of two theories of syntactic processing complexity, dependency locality theory (DLT) and surprisal, against the Dundee Corpus, which contains the eye-tracking record of 10 participants reading 51,000 words of newspaper text. Our results show that DLT integration cost is not a significant predictor of reading times for arbitrary words in the corpus. However, DLT succe...
متن کامل