A Corpus of Online Discussions for Research into Linguistic Memes

نویسندگان

  • Dayne Freitag
  • Ed Chow
  • Paul Kalmar
  • Tulay Muezzinoglu
  • John Niekrasz
چکیده

We describe a 460-million word corpus of online discussions. The data are collected from public news websites and community-ofinterest Internet forums, and are designed to support research on the propagation of socially relevant ideas, a.k.a., “memes.” A structural and statistical description of the corpus is given, and the employed methods of website monitoring, collection, and extraction are described. We also present preliminary linguistic research on the corpus. We show that the corpus represents language from a wide variety of social and psychological communities, that discussion structure and popularity can be predicted in large part from lexical analysis, and that standard epidemiological models provide good fit for diachronic patterns of population-level lexical adoption.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Categorization of Semantics of Fashion Language: A Memetic Approach

Categories are not invariant. This paper attempts to explore the dynamic nature of semantic category, in particular, that of fashion language, based on the cognitive theory of Dawkins’ memetics, a new theory of cultural evolution. Semantic attributes of linguistic memes decrease or proliferate in replication and spreading, which involves a dynamic development of semantic category. More specific...

متن کامل

Identification and Extraction of Memes represented as Semantic Networks from Free Text Online Forums

Memes have recently come into vogue in the context of 'viral' transmission of basic information units in online social networks. However, from their original general definition in a sociological context, there is still much work to be done from an information technology viewpoint. This includes such issues as how to process memes from real text corpus, formal definitions for knowledge represent...

متن کامل

Effects of Receiving Corrective Feedback through Online Chats and Class Discussions on Iranian EFL Learners' Writing Quality

Giving corrective feedback (CF) is an essential part of the teaching and learning process, and the way it should beneficially be done has been the focus of attention for numerous researchers especially when traditional ways of CF provision are not possible, particularly in rare situations such as outbreaks of diseases. This study investigated how different ways of giving feedback; namely, throu...

متن کامل

A Linguistic Analysis of the Online Debate on Vaccines and Use of Fora as Information Stations and Confirmation Niche

This study looks at the communication between users concerning health risks, with the aim of exploring their use of fora and assessing whether participants establish a niche with like-minded users during these exchanges. By integrating a corpus linguistic approach with content analysis and multiple studies on computer mediated health discourse, this study analyses the intense attention paid to ...

متن کامل

The Assessment of Pragmatic Knowledge in the Online General IELTS-Practice Resources: A Corpus Analysis of Writing Tasks

Motivated by the concept of Communicative Language Ability and the eminence of the IELTS exam, this study intended to scrutinize the representation of functional knowledge (FK) and socio-linguistic knowledge (SK) as sub-components of pragmatic knowledge in the writing performances of both tasks of the online General IELTS-practice resources across three band scores. This quantitative inter-scor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012