Cross-Language Document Summarization Based on Machine Translation Quality Prediction
نویسندگان
چکیده
Cross-language document summarization is a task of producing a summary in one language for a document set in a different language. Existing methods simply use machine translation for document translation or summary translation. However, current machine translation services are far from satisfactory, which results in that the quality of the cross-language summary is usually very poor, both in readability and content. In this paper, we propose to consider the translation quality of each sentence in the English-to-Chinese cross-language summarization process. First, the translation quality of each English sentence in the document set is predicted with the SVM regression method, and then the quality score of each sentence is incorporated into the summarization process. Finally, the English sentences with high translation quality and high informativeness are selected and translated to form the Chinese summary. Experimental results demonstrate the effectiveness and usefulness of the proposed approach.
منابع مشابه
A Graph-based Approach to Cross-language Multi-document Summarization
Cross-language summarization is the task of generating a summary in a language different from the language of the source documents. In this paper, we propose a graph-based approach to multi-document summarization that integrates machine translation quality scores in the sentence extraction process. We evaluate our method on a manually translated subset of the DUC 2004 evaluation campaign. Resul...
متن کاملExperiments in Cross Language Query Focused Multi-Document Summarization
The twin challenges of massive information overload via the web and ubiquitous computers present us with an unavoidable task: developing techniques to handle multilingual information robustly and efficiently, with as high quality performance as possible. Previous research activities on multilingual information access systems have studied cross-language information retrieval (CLIR), information ...
متن کاملPhrase-based Compressive Cross-Language Summarization
The task of cross-language document summarization is to create a summary in a target language from documents in a different source language. Previous methods only involve direct extraction of automatically translated sentences from the original documents. Inspired by phrasebased machine translation, we propose a phrase-based model to simultaneously perform sentence scoring, extraction and compr...
متن کاملGetting Information from Documents You Cannot Read: An Interactive Cross-Language Text Retrieval and Summarization System
In this paper we discuss research designed to investigate the ability of users to find information in texts written in languages unknown to them. One study shows how document thumbnail visualizations can be used effectively to choose potentially relevant documents. Another study shows how a user of a cross-language text retrieval system who has no foreign language knowledge can never-the-less c...
متن کاملHeadline Generation Based on Statistical Translation
Extractive summarization techniques cannot generate document summaries shorter than a single sentence, something that is often required. An ideal summarization system would understand each document and generate an appropriate summary directly from the results of that understanding. A more practical approach to this problem results in the use of an approximation: viewing summarization as a probl...
متن کامل