Discovering Multilingual Text Reuse in Literary Texts
نویسندگان
چکیده
We present here a method for automatically discovering several classes of text reuse across different languages, from the most similar (translations) to the most oblique (literary allusions). Allusions are an important subclass of reuse because they involve the appropriation of isolated words and phrases within otherwise unrelated sentences, so that traditional methods of identifying reuse including topical similarity and translation models do not apply. To evaluate this work we have created (and publicly released) a test set of literary allusions between John Milton’s Paradise Lost and Vergil’s Aeneid; we find that while the baseline discovery of translations (55.0% F-measure) far surpasses the discovery of allusions (4.8%), its ability to expedite the traditional work of humanities scholars makes it a line of research strongly worth pursuing.
منابع مشابه
A Computational Model of Text Reuse in Ancient Literary Texts
We propose a computational model of text reuse tailored for ancient literary texts, available to us often only in small and noisy samples. The model takes into account source alternation patterns, so as to be able to align even sentences with low surface similarity. We demonstrate its ability to characterize text reuse in the Greek New Testament.
متن کاملThe Compilation of Urbanism Texts by Using the Iranian's-Valuable Texts (With Emphasis on the Islamic Ethics)
It is clear that each community should be have the specific urbanism science. Science localization is an obvious matter. This matter has motivated Iranian researchers, in urbanism field, to naturalize urbanism science having been imported to Iran. One method for producing or indigenizing urbanism texts in Iran, especially in recent years, is Utilization of Iranian-valuable texts. There are high...
متن کاملThe Effect of Authentic and Simplified Literary Texts on the Reading Comprehension of Iranian Advanced EFL Learners
The present quasi-experimental study mainly investigates the role of literature as input for reading comprehension in Iranian EFL classrooms. To be more exact, it investigates the effects of authentic and simplified literary texts on the reading comprehension of Iranian advanced EFL learners. The participants were 35 male and female Iranian EFL learners who were at advanced level, studying in a...
متن کاملThe MULINCO corpus and corpus platform
The MULINCO project (MUltiLINgual Corpus of the University of Copenhagen) started early 2005. The purpose of this crossdisciplinary project is to create a corpus platform for education and research in monolingual and translation studies. The project covers two main types of corpus texts: literary and non-literary. The platform is being developed using available tools as far as possible, and int...
متن کاملModelling the Interpretation of Literary Allusion with Machine Learning Techniques
A Computational Perspective on Allusion Most literary allusion, the deliberate evocation by one text of a passage in another, is based upon text reuse. Yet most instances of textual similarity are not meaningful literary allusions. The goal of the Tesserae project (http://tesserae.caset.buffalo.edu) is to automatically detect allusion in a corpus of literary texts, primarily Classical Latin poe...
متن کامل