Learning to Simplify Sentences Using Wikipedia

نویسندگان

  • William Coster
  • David Kauchak
چکیده

In this paper we examine the sentence simplification problem as an English-to-English translation problem, utilizing a corpus of 137K aligned sentence pairs extracted by aligning English Wikipedia and Simple English Wikipedia. This data set contains the full range of transformation operations including rewording, reordering, insertion and deletion. We introduce a new translation model for text simplification that extends a phrasebased machine translation approach to include phrasal deletion. Evaluated based on three metrics that compare against a human reference (BLEU, word-F1 and SSA) our new approach performs significantly better than two text compression techniques (including T3) and the phrase-based translation system without deletion.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming

Text simplification aims to rewrite text into simpler versions, and thus make information accessible to a broader audience. Most previous work simplifies sentences using handcrafted rules aimed at splitting long sentences, or substitutes difficult words using a predefined dictionary. This paper presents a datadriven model based on quasi-synchronous grammar, a formalism that can naturally captur...

متن کامل

Document Summarization using Wikipedia

© Document summarization using Wikipedia Krishnan Ramanathan, Yogesh Sankarasubramaniam, Nidhi Mathur, Ajay Gupta HP Laboratories HPL-2009-39 Single Document Summarization, Wikipedia, ROUGE Although most of the developing world is likely to first access the Internet through mobile phones, mobile devices are constrained by screen space, bandwidth and limited attention span. Single document summa...

متن کامل

Experimental evaluation of learning performance for exploring the shortest paths in hyperlink network of Wikipedia

In a 9-hour experiment we evaluated learning performance based on exploring the shortest paths in hyperlink network of Wikipedia online encyclopedia. Relying on network of 35688 unique hyperlinks in three separate learning sessions of 20 minutes students read series of 62 sentences built by using 22 unique hyperlinks that form the eleven shortest paths and answered pre-test and post-test multip...

متن کامل

Learning to Identify Definitions using Syntactic Features

This paper describes an approach to learning concept definitions which operates on fully parsed text. A subcorpus of the Dutch version of Wikipedia was searched for sentences which have the syntactic properties of definitions. Next, we experimented with various text classification techniques to distinguish actual definitions from other sentences. A maximum entropy classifier which incorporates ...

متن کامل

Grammar frequency and simplification: when intuition fails

We investigate whether a medical writer can simplify text by only changing the grammatical structure. Based on a user study, we find that while the sentences look simpler after simplification, they are not easier to understand. For grammatical simplification, better tools are needed to provide more concrete guidance and feedback. Introduction Providing text to patients and health information co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011