Authorship Attribution using Unsupervised Clustering Algorithms on English C50 News Articles
نویسندگان
چکیده
منابع مشابه
Unsupervised authorship attribution
We describe a technique for attributing parts of a written text to a set of unknown authors. Nothing is assumed to be known a priori about the writing styles of potential authors. We use multiple independent clusterings of an input text to identify parts that are similar and dissimilar to one another. We describe algorithms necessary to combine the multiple clusterings into a meaningful output....
متن کاملEfficient Unsupervised Authorship Clustering Using Impostor Similarity
Some real-world authorship analysis applications require techniques that scale to thousands of documents with little or no a priori information about the number of candidate authors. While there is extensive research on identifying authors given a small set of candidates and ample training data, almost none is based on real-world applications of clustering documents by authorship, independent o...
متن کاملW-kmeans: Clustering News Articles Using WordNet
Document clustering is a powerful technique that has been widely used for organizing data into smaller and manageable information kernels. Several approaches have been proposed suffering however from problems like synonymy, ambiguity and lack of a descriptive content marking of the generated clusters. We are proposing the enhancement of standard kmeans algorithm using the external knowledge fro...
متن کاملStyle based Authorship Attribution on English Editorial Documents
The aim of the authorship attribution is identification of the author/s of unknown document(s). Every author has a unique style of writing pattern. The present paper identifies the unique style of an author(s) using lexical stylometric features. The lexical feature vectors of various authors are used in the supervised machine learning algorithms for predicting the unknown document. The highest ...
متن کاملExperiments on authorship attribution by intertextual distance in English
How can it be said that texts are "near" or "distant" from one another? Are different texts by a single author more similar than texts by different authors? To answer these questions, a method is proposed by combination of the calculus of intertextual distance with automatic clustering and tree-classification. A blind test and some additional experiments show that this method offers an interest...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IARJSET
سال: 2017
ISSN: 2394-1588,2393-8021
DOI: 10.17148/iarjset.2017.4747