E-mail authorship attribution using customized associative classification

نویسندگان

  • Michael R. Schmid
  • Farkhund Iqbal
  • Benjamin C. M. Fung
چکیده

E-mail communication is often abused for conducting social engineering attacks including spamming, phishing, identity theft and for distributing malware. This is largely attributed to the problem of anonymity inherent in the standard electronic mail protocol. In the literature, authorship attribution is studied as a text categorization problem where the writing styles of individuals are modeled based on their previously written sample documents. The developed model is employed to identify the most plausible writer of the text. Unfortunately, most existing studies focus solely on improving predictive accuracy and not on the inherent value of the evidence collected. In this study, we propose a customized associative classification technique, a popular data mining method, to address the authorship attribution problem. Our approach models the unique writing style features of a person, measures the associativity of these features and produces an intuitive classifier. The results obtained by conducting experiments on a real dataset reveal that the presented method is very effective. © 2015 The Authors. Published by Elsevier Ltd on behalf of DFRWS. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Authorship Identification of E-mail as a Multi-Class Task - Notebook for PAN at CLEF 2011

In this paper, we describe a multi-class text categorization approach to authorship attribution and test it on sets of e-mail collections. The PAN 2011 competition data consists of e-mails of variable length, written by various candidate authors, with some represented by significantly longer or more e-mails than others. Rather than construct a classifier for each separate author to discriminate...

متن کامل

A Novel Approach of Mining Write-Prints for Authorship Attribution in E-mail Forensics

There is an alarming increase in the number of cybercrime incidents through anonymous e-mails. The problem of e-mail authorship attribution is to identify the most plausible author of an anonymous e-mail from a group of potential suspects. Most previous contributions employed a traditional classification approach, such as decision tree and Support Vector Machine (SVM), to identify the author an...

متن کامل

A survey of modern authorship attribution methods

Authorship attribution supported by statistical or computational methods has a long history starting from 19th century and marked by the seminal study of Mosteller and Wallace (1964) on the authorship of the disputed Federalist Papers. During the last decade, this scientific field has been developed substantially taking advantage of research advances in areas such as machine learning, informati...

متن کامل

The effect of author set size and data size in authorship attribution

Applications of authorship attribution ‘in the wild’ [Koppel, M., Schler, J., and Argamon, S. (2010). Authorship attribution in the wild. Language Resources and Evaluation. Advanced Access published January 12, 2010:10.1007/ s10579-009-9111-2], for instance in social networks, will likely involve large sets of candidate authors and only limited data per author. In this article, we present the r...

متن کامل

A Survey on Authorship Analysis

The paper discusses about the problem of Authorship analysis, different types of authorship analysis’s such as authorship attribution, authorship identification, authorship profiling, plagiarism detection. It also addresses the issues in Indian language text. Keywords— Authorship attribution, authorship profiling, plagiarism detection, text classification.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Digital Investigation

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2015