Freshman or Fresher? Quantifying the Geographic Variation of Language in Online Social Media

نویسندگان

  • Vivek Kulkarni
  • Bryan Perozzi
  • Steven Skiena
چکیده

In this paper we present a new computational technique to detect and analyze statistically significant geographic variation in language. While previous approaches have primarily focused on lexical variation between regions, our method identifies words that demonstrate semantic and syntactic variation as well. Our meta-analysis approach captures statistical properties of word usage across geographical regions and uses statistical methods to identify significant changes specific to regions. We extend recently developed techniques for neural language models to learn word representations which capture differing semantics across geographical regions. In order to quantify this variation and ensure robust detection of true regional differences, we formulate a null model to determine whether observed changes are statistically significant. Our method is the first such approach to explicitly account for random variation due to chance while detecting regional variation in word meaning. To validate our model, we study and analyze two different massive online data sets: millions of tweets from Twitter spanning not only four different countries but also fifty states, as well as millions of phrases contained in the Google Book Ngrams. Our analysis reveals interesting facets of language change at multiple scales of geographic resolution – from neighboring states to distant continents. Finally, using our model, we propose a measure of semantic distance between languages. Our analysis of British and American English over a period of 100 years reveals that semantic variation between these dialects is shrinking.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Freshman or Fresher? Quantifying the Geographic Variation of Internet Language

We present a new computational technique to detect and analyze statistically significant geographic variation in language. Our meta-analysis approach captures statistical properties of word usage across geographical regions and uses statistical methods to identify significant changes specific to regions. While previous approaches have primarily focused on lexical variation between regions, our ...

متن کامل

Social Media Writing and Social Class: A Correlational Analysis of Adolescent CMC and Social Background

In a large social media corpus (2.9 million tokens), we analyze Flemish adolescents’ non-standard writing practices and look for correlations with the teenagers’ social class. Three different aspects of adolescents’ social background are included: educational track, parental profession, and home language. Since the data reveal that these parameters are highly correlated, we combine them into on...

متن کامل

The Effect of Online Learning Tools on L2 Reading Comprehension and Vocabulary Learning

The aim of this study was to investigate the effects of various online techniques (word reference, media, and vocabulary games) on reading comprehension as well as vocabulary comprehension and production. For this purpose, 60 language learners were selected and divided into three groups, and each group was randomly assigned to one of the treatment conditions. In the first session of tre...

متن کامل

Considering the Future of Pharmaceutical Promotions in Social Media; Comment on “Trouble Spots in Online Direct-to-Consumer Prescription Drug Promotion: A Content Analysis of FDA Warning Letters”

This commentary explores the implications of increased social media marketing by drug manufacturers, based on findings in Hyosun Kim’s article of the major themes in recent Food and Drug Administration (FDA) warning letters and notices of violation regarding online direct-to-consumer promotions of pharmaceuticals. Kim’s rigorous analysis of FDA letters over a 10-year span highlights a relative ...

متن کامل

Social Campaigns on Online Platforms as a New Form of Public Sphere in Digital Era: A Critical Review

Nowadays with the ever-increasing growth in social media platforms and the creation of different forms of online activism, the word known as “Campaign” has become a familiar and useful term in people’s everyday lives. Campaigns with all kinds of social aims especially using Hashtags are run on social media platforms by individuals, charities, NGOs, governments, municipalities and brand companie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016