Temporal Text Classification for Romanian Novels set in the Past

نویسندگان

  • Alina Maria Ciobanu
  • Liviu P. Dinu
  • Octavia-Maria Sulea
  • Anca Dinu
  • Vlad Niculae
چکیده

In this paper we look at a task in historical linguistics and the study of language development, namely that of identifying the time when a text was written. The novelty is that we evaluate our classifier and our selected features on literary texts having their action placed in the past and written so as to give off the impression of the respective epoch. We investigate several types of features and ultimately go with a very simple set of 10 features which very accurately classifies the texts based on the century they were actually written in. We use random forests to obtain high performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Flying Carpet of Orientalism: Reading Anita Amirrezvani’s The Blood of Flowers

This article draws attention to the ways in which Anita Amirrezvani’s The Blood of Flowers (2007), a historical novel set in 17th-century Iran, can be placed within the neo-orientalist discourse which informs many of the post-9/11 memoirs and novels set in contemporary Iran by women of the Iranian diaspora in the United States. Besides being a novel on Islam and Islamic rule—which makes it much...

متن کامل

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

Temporal Text Ranking and Automatic Dating of Texts

This paper presents a novel approach to the task of temporal text classification combining text ranking and probability for the automatic dating of historical texts. The method was applied to three historical corpora: an English, a Portuguese and a Romanian corpus. It obtained performance ranging from 83% to 93% accuracy, using a fully automated approach with very basic features.

متن کامل

Expanding Middle School students’ Literacy Skills using the Journey Motif in Three Middle Grade Novels and Short Stories

This paper analyzed three middle grade novels and a short story from two cultures using the journey motif as the vehicle for the analysis. The three novels are Bud Not Buddy and The Watsons Go to Birmingham both by Paul Curtis   and Journey to Jo’burg by Naidoo (1986). The short story is My Two Dads by M. Lee.  The three novels and a short story were chosen because these no...

متن کامل

سیستم شناسایی و طبقه بندی اسامی در متون فارسی

Name entity recognition (NER) is a system that can identify one or more kinds of names in a text and classify them into specified categories. These categories can be name of people, organizations, companies, places (country, city, street, etc.), time related to names (date and time), financial values, percentages, etc. Although during the past decade a lot of researches has been done on NER in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013