Author Gender Analysis
نویسنده
چکیده
Given an English paragraph of sufficient length, I would like to figure out the gender of the author with sufficiently high accuracy. I wrote a Naïve-Bayes classifier with the assistance of NLTK toolkit, and trained it with frequent words as the main features. The addition of frequent bigrams, trigrams and also part-of-speech tags slightly increased its accuracy. There are some obvious indicators, such as relation-related phrases like “my husband” or “my wife”, or topic-related words like “teaspoon” or “hardware”. However, my goal was building a classifier general enough not to use those. Excluding those salient but biased features, my classifier achieved a sub-optimal accuracy of 69%. This suggests that topic-finding is crucial to author gender analysis. Nevertheless, I still found several mild genders, which may shed light on future research.
منابع مشابه
Author gender identification from text using Bayesian Random Forest
Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...
متن کاملComplementing Gender Analysis Methods.
The existing gender analysis frameworks start with a premise that men and women are equal and should be treated equally. These frameworks give emphasis on equal distribution of resources between men and women and believe that this will bring equality which is not always true. Despite equal distribution of resources, women tend to suffer and experience discrimination in many areas of their lives...
متن کاملA Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملReader and author gender and genre in Goodreads1
There are known gender differences in book preferences in terms of both genre and author gender but their extent and causes are not well understood. It is unclear whether reader preferences for author genders occur within any or all genres and whether readers evaluate books differently based on author genders within specific genres. This article exploits a major source of informal book reviews,...
متن کاملAutomatic Categorization of Author Gender via N-Gram Analysis
We present a method for automatic categorization of author gender via n-gram analysis. Using a corpus of British student essays, experiments using character-level, wordlevel, and part-of-speech n-grams are performed. The peak accuracy for all methods is roughly equal, reaching a maximum of 81%. These results are on par with other, established techniques, while retaining the simplicity and ease-...
متن کاملGender Representation of Emotions in the Novel A Hero of Our Time by Mikhail Lermontov
The article deals with emotions represented through images of the characters of M.Y. Lermontov’s novel A Hero of Our Time. The author consecutively analyzes elements of the text, in which emotions of male and female characters are nominated, directly expressed and described. The number of lexical units and text elements involved in the representation of a particular emotion is recorded in...
متن کامل