Temporal Text Classification for Romanian Novels set in the Past
نویسندگان
چکیده
In this paper we look at a task in historical linguistics and the study of language development, namely that of identifying the time when a text was written. The novelty is that we evaluate our classifier and our selected features on literary texts having their action placed in the past and written so as to give off the impression of the respective epoch. We investigate several types of features and ultimately go with a very simple set of 10 features which very accurately classifies the texts based on the century they were actually written in. We use random forests to obtain high performance.
منابع مشابه
On the Flying Carpet of Orientalism: Reading Anita Amirrezvani’s The Blood of Flowers
This article draws attention to the ways in which Anita Amirrezvani’s The Blood of Flowers (2007), a historical novel set in 17th-century Iran, can be placed within the neo-orientalist discourse which informs many of the post-9/11 memoirs and novels set in contemporary Iran by women of the Iranian diaspora in the United States. Besides being a novel on Islam and Islamic rule—which makes it much...
متن کاملArabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملTemporal Text Ranking and Automatic Dating of Texts
This paper presents a novel approach to the task of temporal text classification combining text ranking and probability for the automatic dating of historical texts. The method was applied to three historical corpora: an English, a Portuguese and a Romanian corpus. It obtained performance ranging from 83% to 93% accuracy, using a fully automated approach with very basic features.
متن کاملExpanding Middle School students’ Literacy Skills using the Journey Motif in Three Middle Grade Novels and Short Stories
This paper analyzed three middle grade novels and a short story from two cultures using the journey motif as the vehicle for the analysis. The three novels are Bud Not Buddy and The Watsons Go to Birmingham both by Paul Curtis and Journey to Jo’burg by Naidoo (1986). The short story is My Two Dads by M. Lee. The three novels and a short story were chosen because these no...
متن کاملسیستم شناسایی و طبقه بندی اسامی در متون فارسی
Name entity recognition (NER) is a system that can identify one or more kinds of names in a text and classify them into specified categories. These categories can be name of people, organizations, companies, places (country, city, street, etc.), time related to names (date and time), financial values, percentages, etc. Although during the past decade a lot of researches has been done on NER in ...
متن کامل