Using the crowd for readability prediction

نویسندگان

  • Orphée De Clercq
  • Véronique Hoste
  • Bart Desmet
  • Philip van Oosten
  • Martine De Cock
  • Lieve Macken
چکیده

Inspired by previous work on crowdsourcing we investigate two different methodologies to assess the readability of a wide variety of text material by implementing two assessment tools. A lightweight crowdsourcing tool which invites users to provide pairwise comparisons and a more advanced version where experts can rank a batch of texts based on readability. In order to validate this approach, readability assessments for a corpus of written Dutch generic texts were gathered. By collecting multiple assessments per text, we explicitly wanted to level out the reader’s background knowledge and attitude. Our findings show that the assessments collected through both annotation tools are highly consistent and that crowdsourcing is a viable alternative to expert labeling. By performing a set of basic machine learning experiments, we further illustrate how the collected data can be used to perform text comparisons or to assign an absolute readability score to an individual text. In order to account for the latter case, we defined a readability measure which is easy to estimate from the data. We do not focus on optimizing the algorithms to achieve the best possible results for the learning tasks, but only carry them out them to illustrate the various possibilities of our data sets. Nevertheless, we show that for each of the tasks, there is a machine learning algorithm outperforming the classical readability formulas. We conclude that readability assessment by comparing texts is a polyvalent methodology, which can be adapted to specific domains and target audiences if required.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Readability Annotation: Replacing the Expert by the Crowd

This paper investigates two strategies for collecting readability assessments, an Expert Readers application intended to collect fine-grained readability assessments from language experts and a Sort by Readability application designed to be intuitive and open for everyone having internet access. We show that the data sets resulting from both annotation strategies are very similar. We conclude t...

متن کامل

Automatic Readability Classification of Crowd-Sourced Data based on Linguistic and Information-Theoretic Features

This paper presents a classifier of text readability based on information-theoretic features. The classifier was developed based on a linguistic approach to readability that explores lexical, syntactic and semantic features. For this evaluation we extracted a corpus of 645 articles from Wikipedia together with their quality judgments. We show that information-theoretic features perform as well ...

متن کامل

A Data-driven Method for Crowd Simulation using a Holonification Model

In this paper, we present a data-driven method for crowd simulation with holonification model. With this extra module, the accuracy of simulation will increase and it generates more realistic behaviors of agents. First, we show how to use the concept of holon in crowd simulation and how effective it is. For this reason, we use simple rules for holonification. Using real-world data, we model the...

متن کامل

Workshop Predicting and Improving Readability

s Scott Crossley Crowdsourcing text complexity models The current study builds on work by De Clercq et al. (2014) and Crossley et al. (2017) by using crowdsourcing techniques to collect human ratings of text comprehension, processing, and familiarity across a large corpus comprising a diverse variety of topic domains (science, technology, and history). Pairwise comparisons among the ratings wer...

متن کامل

All Mixed Up? Finding the Optimal Feature Set for General Readability Prediction and Its Application to English and Dutch

Readability research has a long and rich tradition, but there has been too little focus on general readability prediction without targeting a specific audience or text genre. Moreover, although NLP-inspired research has focused on adding more complex readability features, there is still no consensus on which features contribute most to the prediction. In this article, we investigate in close de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Natural Language Engineering

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2014