Syntactic Stylometry: Using Sentence Structure for Authorship Attribution
نویسندگان
چکیده
Most approaches to statistical stylometry have concentrated on lexical features, such as relative word frequencies or type-token ratios. Syntactic features have been largely ignored. This work attempts to fill that void by introducing a technique for authorship attribution based on dependency grammar. Syntactic features are extracted from texts using a common dependency parser, and those features are used to train a classifier to identify texts by author. While the method described does not outperform existing methods on most tasks, it does demonstrate that purely syntactic features carry information which could be useful for stylometric analysis. Index words: stylometry, authorship attribution, dependency grammar, machine learning Syntactic Stylometry: Using Sentence Structure for Authorship Attribution
منابع مشابه
Characterizing Stylistic Elements in Syntactic Structure
Much of the writing styles recognized in rhetorical and composition theories involve deep syntactic elements. However, most previous research for computational stylometric analysis has relied on shallow lexico-syntactic patterns. Some very recent work has shown that PCFG models can detect distributional difference in syntactic styles, but without offering much insights into exactly what constit...
متن کاملA Deep Context Grammatical Model For Authorship Attribution
We define a variable-order Markov model, representing a Probabilistic Context Free Grammar, built from the sentence-level, delexicalized parse of source texts generated by a standard lexicalized parser, which we apply to the authorship attribution task. First, we motivate this model in the context of previous research on syntactic features in the area, outlining some of the general strengths an...
متن کاملAutomatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs
We apply the integrated syntactic graph feature extraction methodology to the task of automatic authorship detection. This graph-based representation allows integrating different levels of language description into a single structure. We extract textual patterns based on features obtained from shortest path walks over integrated syntactic graphs and apply them to determine the authors of docume...
متن کاملLost in Translation: Authorship Attribution using Frame Semantics
We investigate authorship attribution using classifiers based on frame semantics. The purpose is to discover whether adding semantic information to lexical and syntactic methods for authorship attribution will improve them, specifically to address the difficult problem of authorship attribution of translated texts. Our results suggest (i) that frame-based classifiers are usable for author attri...
متن کاملDeep Sentence-Level Authorship Attribution
We examine the problem of authorship attribution in collaborative documents. We seek to develop new deep learning models tailored to this task. We have curated a novel dataset by parsing Wikipedia’s edit history, which we use to demonstrate the feasiblity of deep models to multi-author attribution at the sentence-level. Though we attempt to formulate models which learn stylometric features base...
متن کامل