Learning Age and Gender of Blogger from Stylistic Variation
نویسندگان
چکیده
We report results of stylistic differences in blogging for gender andagegroupvariation.Theresultsarebasedontwomutually independent features. The first feature is the use of slang words which is a new concept proposed by us for Stylistic study of bloggers. For the second feature, we have analyzed the variation in average length of sentences across various age groups and gender. These features are augmented with previous study results reported in literature for stylistic analysis. The combined feature list enhances the accuracy by a remarkable extent in predicting age and gender. These machine learning experiments were done on two separate demographicallytaggedblogcorpus.Genderdeterminationismoreaccurate thanagegroupdetectionoverthedataspreadacrossallagesbuttheaccuracy of ageprediction increases ifwe sample datawith remarkable agedifference.
منابع مشابه
Gender and Genre Variation in Weblogs
A relationship among language, gender, and discourse genre has previously been observed in informal, spoken interaction and formal, written texts. This study investigates the language/gender/genre relationship in weblogs, a popular new mode of computer-mediated communication (CMC). Taking as the dependent variables stylistic features identified in machine learning research and popularized in a ...
متن کاملModeling of Stylistic Variation in Social Media with Stretchy Patterns
In this paper we describe a novel feature discovery technique that can be used to model stylistic variation in sociolects. While structural features offer much in terms of expressive power over simpler features used more frequently in machine learning approaches to modeling linguistic variation, they frequently come at an excessive cost in terms of feature space size expansion. We propose a nov...
متن کاملLearning Age and Gender Using Co-occurrence of Non-dictionary Words from Stylistic Variations
This work attempts to report the stylistic differences in blogging for gender and age group variations using slang word co-occurrences. We have mainly focused on co-occurrence of non dictionary words across bloggers of different gender and age groups. For this analysis, we have focused on the feature use of slang words to study the stylistic variations of bloggers across various age groups and ...
متن کاملA Stylistic and Proficiency-based Approach to EFL Learners’ Performance Inconsistency
Performance deficiencies and inconsistencies among SLA or FL learners can be attributed to variety of sources including both systemic (i.e., language issues) and individual variables. Contrary to a rich background, the literature still suffers from a gap as far as delving into the issue from language proficiency and learning style is concerned. To fill the gap, this study addressed EFL learner...
متن کاملExploring the Effects of Cross-Genre Machine Learning for Author Profiling in PAN 2016
Author profiling deals with the study of various profile dimensions of an author such as age and gender. This work describes our methodology proposed for the task of cross-genre author profiling at PAN 2016. We address gender and age prediction as a classification task and approach this problem by extracting stylistic and lexical features for training a logistic regression model. Furthermore, w...
متن کامل