Sentence simplification, compression, and disaggregation for summarization of sophisticated documents

نویسندگان

  • Catherine Finegan-Dollak
  • Dragomir R. Radev
چکیده

Sophisticated documents like legal cases and biomedical articles can contain unusually long sentences. Extractive summarizers can select such sentences— potentially adding hundreds of unnecessary words to the summary—or exclude them and lose important content. Sentence simplification or compression seems on the surface to be a promising solution. However, compression removes words before the selection algorithm can use them, and simplification generates sentences that may be ambiguous in an extractive summary. We therefore compare the performance of an extractive summarizer selecting from the sentences of the original document with that of the summarizer selecting from sentences shortened in three ways: simplification, compression, and disaggregation, which splits one sentence into several according to rules designed to keep all meaning. We find that on legal cases and biomedical articles, these shortening methods generate ungrammatical output. Human evaluators performed an extrinsic evaluation consisting of comprehension questions about the summaries. Evaluators given compressed, simplified, or disaggregated versions of the summaries answered fewer questions correctly than did those given summaries with unaltered sentences. Error analysis suggests 2 causes: Altered sentences sometimes interact with the sentence selection algorithm, and alterations to sentences sometimes obscure information in the summary. We discuss future work to alleviate these problems. Introduction

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UNIVERSITY OF BIRMINGHAM Automatic Documents Summarization Using Ontology based Methodologies

In this paper, we emphasize the need for conservingspace within sentences by introducing a Sentences SimplificationModule (SSM). The module is aimed to shorten the length ofsentences via either splitting or compression. We describe howthe module is integrated in a Wikipedia-based summarizationframework. We highlight the performance differences obtainedfrom introducing su...

متن کامل

SIMBA: An Extractive Multi-document Summarization System for Portuguese

This is a proposal for demonstration of simba in PROPOR 2012. simba is an extractive multi-document summarization system that aims at producing generic summaries guided by a compression rate defined by the user. It uses a double-clustering approach to find the relevant information in a set of texts. In addition, simba uses a sentence simplification procedure as a mean to ensure summary compress...

متن کامل

Using First-Order Logic to Compress Sentences

Sentence compression is one of the most challenging tasks in natural language processing, which may be of increasing interest to many applications such as abstractive summarization and text simplification for mobile devices. In this paper, we present a novel sentence compression model based on first-order logic, using Markov Logic Network. Sentence compression is formulated as a word/phrase del...

متن کامل

A French Human Reference Corpus for Multi-Document Summarization and Sentence Compression

This paper presents two corpora produced within the RPM2 project: a multi-document summarization corpus and a sentence compression corpus. Both corpora are in French. The first one is the only one we know in this language. It contains 20 topics with 20 documents each. A first set of 10 documents per topic is summarized and then the second set is used to produce an update summarization (new info...

متن کامل

Summarization with a Joint Model for Sentence Extraction and Compression

Text summarization is one of the oldest problems in natural language processing. Popular approaches rely on extracting relevant sentences from the original documents. As a side effect, sentences that are too long but partly relevant are doomed to either not appear in the final summary, or prevent inclusion of other relevant sentences. Sentence compression is a recent framework that aims to sele...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 67  شماره 

صفحات  -

تاریخ انتشار 2016