Japanese News Simplification: Task Design, Data Set Construction, and Analysis of Simplified Text
نویسندگان
چکیده
In this paper we explore a Japanese news simplification task. We designed a Japanese news simplification task, constructed the data set for the task, and analyzed the manual simplification process. We designed the task focusing on sentence-level simplification, which is part of the process of manual simplification of Japanese news for non-native speakers. We constructed the data set consisting of Japanese news sentences and their corresponding simplified Japanese news sentences, and verified the effectiveness of the data set for automatic simplification by conducting preliminary experiments using phrase-based statistical machine translation. To reveal the processes behind manual simplification, such as simplification associated with word order (syntactic structure), we analyzed manually simplified Japanese news sentences.
منابع مشابه
An Unsupervised Alignment Algorithm for Text Simplification Corpus Construction
We present a method for the sentence-level alignment of short simplified text to the original text from which they were adapted. Our goal is to align a medium-sized corpus of parallel text, consisting of short news texts in Spanish with their simplified counterpart. No training data is available for this task, so we have to rely on unsupervised learning. In contrast to bilingual sentence alignm...
متن کاملSimplifying metaphorical language for young readers: A corpus study on news text
The paper presents first results of an ongoing project on text simplification focusing on linguistic metaphors. Based on an analysis of a parallel corpus of news text professionally simplified for different grade levels, we identify six types of simplification choices falling into two broad categories: preserving metaphors or dropping them. An annotation study on almost 300 source sentences wit...
متن کاملOptimizing Statistical Machine Translation for Text Simplification
Most recent sentence simplification systems use basic machine translation models to learn lexical and syntactic paraphrases from a manually simplified parallel corpus. These methods are limited by the quality and quantity of manually simplified corpora, which are expensive to build. In this paper, we conduct an indepth adaptation of statistical machine translation to perform text simplification...
متن کاملArabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملImproving Text Simplification Language Modeling Using Unsimplified Text Data
In this paper we examine language modeling for text simplification. Unlike some text-to-text translation tasks, text simplification is a monolingual translation task allowing for text in both the input and output domain to be used for training the language model. We explore the relationship between normal English and simplified English and compare language models trained on varying amounts of t...
متن کامل