MUSE – A Multilingual Sentence Extractor

نویسندگان

  • Marina Litvak
  • Mark Last
  • Menahem Friedman
  • Slava Kisilevich
چکیده

MUltilingual Sentence Extractor (MUSE) is aimed at multilingual single-document summarization. MUSE implements the supervised language-independent summarization approach based on optimization of multiple statistical sentence ranking methods. The MUSE tool consists of two main modules: the training module activated in the offline mode, and the on-line summarization module. The training module can be provided with a corpus of summarized texts in any language. Then, it learns the best linear combination of user specified sentence ranking measures applying a Genetic Algorithm to the given training data. The summarization module performs real-time sentence extraction by computing sentence rankings according to the weighted model induced in the training phase. The main advantage of MUSE is its language-independency – it can be applied to any language given a gold standard summaries in that language. The performance of MUSE in our previous works was found to be significantly better than the best known state-of-theart extractive summarization approaches and tools in the three different languages: English, Hebrew, and Arabic. Moreover, our experimental results in a cross-lingual domain suggest that MUSE does not need to be retrained on each new language, and the same weighting model can be used across several languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilingual Single-Document Summarization with MUSE

MUltilingual Sentence Extractor (MUSE) is aimed at multilingual single-document summarization. MUSE implements a supervised language-independent summarization approach based on optimization of multiple sentence ranking methods using a Genetic Algorithm. The main advantage of MUSE is its language-independency – it is using statistical sentence features, which can be calculated for sentences in a...

متن کامل

Language-independent Techniques for Automated Text Summarization

Text summarization is the process of distilling the most important information from source/sources to produce an abridged version for a particular user/users and task/tasks. Automatically generated summaries can significantly reduce the information overload on intelligence analysts in their daily work. Moreover, automated text summarization can be utilized for automated classification and filte...

متن کامل

A New Approach to Improving Multilingual Summarization Using a Genetic Algorithm

Automated summarization methods can be defined as “language-independent,” if they are not based on any languagespecific knowledge. Such methods can be used for multilingual summarization defined by Mani (2001) as “processing several languages, with summary in the same language as input.” In this paper, we introduce MUSE, a languageindependent approach for extractive summarization based on the l...

متن کامل

Opinion Analysis for NTCIR8 at POSTECH

We describe an opinion analysis system developed for a Multilingual Opinion Analysis Task at NTCIR8. Given a topic and relevant newspaper articles, our system determines whether a sentence in the articles has an opinion. If so, we then extract the holder of the opinion. In the opinion judgment task, we constructed a phrase-level opinion expression extractor from sentence-level annotated corpus....

متن کامل

MUSEEC: A Multilingual Text Summarization Tool

The MUSEEC (MUltilingual SEntence Extraction and Compression) summarization tool implements several extractive summarization techniques – at the level of complete and compressed sentences – that can be applied, with some minor adaptations, to documents in multiple languages. The current version of MUSEEC provides the following summarization methods: (1) MUSE – a supervised summarizer, based on ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010