Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment

نویسندگان

  • Dong Nguyen
  • Dolf Trieschnigg
  • A. Seza Dogruöz
  • Rilana Gravel
  • Mariët Theune
  • Theo Meder
  • Franciska de Jong
چکیده

There is a growing interest in automatically predicting the gender and age of authors from texts. However, most research so far ignores that language use is related to the social identity of speakers, which may be different from their biological identity. In this paper, we combine insights from sociolinguistics with data collected through an online game, to underline the importance of approaching age and gender as social variables rather than static biological variables. In our game, thousands of players guessed the gender and age of Twitter users based on tweets alone. We show that more than 10% of the Twitter users do not employ language that the crowd associates with their biological sex. It is also shown that older Twitter users are often perceived to be younger. Our findings highlight the limitations of current approaches to gender and age prediction from texts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analyzing Biases in Human Perception of User Age and Gender from Text

User traits disclosed through written text, such as age and gender, can be used to personalize applications such as recommender systems or conversational agents. However, human perception of these traits is not perfectly aligned with reality. In this paper, we conduct a large-scale crowdsourcing experiment on guessing age and gender from tweets. We systematically analyze the quality and possibl...

متن کامل

Controlling Human Perception of Basic User Traits

Much of our online communication is textmediated and, lately, more common with automated agents. Unlike interacting with humans, these agents currently do not tailor their language to the type of person they are communicating to. In this pilot study, we measure the extent to which human perception of basic user trait information – gender and age – is controllable through text. Using automatic m...

متن کامل

Rebirth of a city lessons learned from post disaster reconstruction the case study: Rofayye\'

After disasters, one of the main challenges confronting authorities is site selection for reconstructing damaged structures. Experiences indicate that appropriate policies in site-selection could greatly influence on reconstruction success and residents' satisfaction. Meanwhile, in literature related to post disaster reconstruction, avoiding from relocating settlements is generally emphasize...

متن کامل

Aging, Pensions and Long-term Care: What, Why, Who, How?; Comment on “Financing Long-term Care: Lessons From Japan”

Japan has been aging faster than other industrialized nations, and its experience offers useful lessons to others. Japan has been willing to expand its welfare state with a long-term care (LTC) insurance to finance home care and nursing home care for frail elderly. As Ikegami shows, it created new facilities and expanded specialized staffing for home care, developed a c...

متن کامل

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014