Authorship Identification in Bengali Literature: a Comparative Analysis

نویسنده

  • Tanmoy Chakraborty
چکیده

COLING 2012, Mumbai, December 2012. Authorship Identi ation in Bengali Literature: a Comparative Analysis Tanmoy Chakraborty Department of Computer S ien e & Engineering Indian Institute of Te hnology, Kharagpur India its_tanmoy se.iitkgp.ernet.in Abstra t Stylometry is the study of the unique linguisti styles and writing behaviors of individuals. It belongs to the ore task of text ategorization like authorship identi ation, plagiarism dete tion et . Though reasonable number of studies have been ondu ted in English language, no major work has been done so far in Bengali. In this work, We will present a demonstration of authorship identi ation of the do uments written in Bengali. We adopt a set of ne-grained stylisti features for the analysis of the text and use them to develop two di erent models: statisti al similarity model onsisting of three measures and their ombination, and ma hine learning model with De ision Tree, Neural Network and SVM. Experimental results show that SVM outperforms other state-of-the-art methods after 10-fold ross validations. We also validate the relative importan e of ea h stylisti feature to show that some of them remain onsistently signi ant in every model used in this experiment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Authorship Attribution in Bengali Language

We describe Authorship Attribution of Bengali literary text. Our contributions include a new corpus of 3,000 passages written by three Bengali authors, an end-toend system for authorship classification based on character n-grams, feature selection for authorship attribution, feature ranking and analysis, and learning curve to assess the relationship between amount of training data and test accu...

متن کامل

A Supervised Authorship Attribution Framework for Bengali Language

Authorship Attribution is a long-standing problem in Natural Language Processing. Several statistical and computational methods have been used to find a solution to this problem. In this paper, we have proposed methods to deal with the authorship attribution problem in Bengali. More specifically, we proposed a supervised framework consisting of lexical and shallow features, and investigated the...

متن کامل

Comparative study of Authorship Identification Techniques for Cyber Forensics Analysis

Authorship Identification techniques are used to identify the most appropriate author from group of potential suspects of online messages and find evidences to support the conclusion. Cybercriminals make misuse of online communication for sending blackmail or a spam email and then attempt to hide their true identities to void detection.Authorship Identification of online messages is the contemp...

متن کامل

Inference of Fine-grained Attributes of Bengali Corpus for Stylometry Detection

the science of inferring characteristics of the author from the characteristics of documents written by that author, is a problem with a long history and belongs to the core task of Text categorization that involves authorship identification, plagiarism detection, forensic investigation, computer security, copyright and estate disputes etc. In this work, we present a strategy for stylometry det...

متن کامل

An Effective Approach for Compression of Bengali Text

In this paper, we propose an effective and efficient approach for compressing Bengali Text. This paper focuses on a methodical study on Bengali text compression techniques. The main target of this research is to provide a framework for Bengali text compression; which ensures a simple and computationally inexpensive effective scheme for Bengali text compression. The proposed Bengali text compres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012