A Model for Mining Public Health Topics from Twitter
نویسندگان
چکیده
We present the Ailment Topic Aspect Model (ATAM), a new topic model for Twitter that associates symptoms, treatments and general words with diseases (ailments). We train ATAM on a new collection of 1.6 million tweets discussing numerous health related topics. ATAM isolates more coherent ailments, such as influenza, infections, obesity, as compared to standard topic models. Furthermore, ATAM matches influenza tracking results produced by Google Flu Trends and previous influenza specialized Twitter models compared with government public health data. 1 Twitter and Public Health Public health researchers dedicate considerable resources to population surveillance, which requires clinical encounters with health professionals. We propose a low cost alternative source for tracking public health trends: Twitter. Several studies have considered using Twitter for tracking various trends, including news tracking (Lerman and Ghosh, 2010; Petrović et al., 2010), earthquake monitoring (Sakaki et al., 2010), sentiment (Barbosa and Feng, 2010), and political opinions (Tumasjan et al., 2010; O’Connor et al., 2010). Similarly, tweets mention health related topics, such as “i got fever 102.5 i got flu i got sore eyes my throat hurts taking tylenol”. This tweet indicates that the user has an ailment (flu), the associated symptoms (fever, etc.) and treatments (tylenol). Health self-reporting across millions of users can provide extensive real time information about population health. In this work, we introduce a new method for extracting general public health information from millions of health related tweets. Previous work in this area has focused specifically on influenza, evaluating influenza surveillance (Lampos and Cristianini, 2010; Culotta, 2010b), analyzing tweets from the H1N1 pandemic (Quincey and Kostkova, 2010), and combining prediction markets and Twitter to predict H1N1 (Ritterman et al., 2009). These results all arise from supervised models built for specific applications (e.g. monitoring the flu.) We present a more general approach that discovers many different ailments and learns symptom and treatment associations from tweets. Our first contribution is to create a data set of 1.6 million health related tweets (beyond just influenza.) To create structured information from these data, we develop a new topic model that organizes health terms into ailments, including associated symptoms and treatments. Our model uses explicit knowledge of symptoms and treatments to separate out coherent ailment groups from more general topics. We show that our model 1) discovers a larger number of more coherent ailments than LDA, 2) produces more detailed ailment information (symptoms/treatments) and 3) tracks disease rates consistent with published government statistics (influenza surveillance) despite the lack of supervised influenza training data. 2 A Twitter Health Corpus We start with a collection of over 2 billion tweets from May 2009 to October 2010 (O’Connor et al., 2010). We first identify which of these messages contain health information. A first high recall keyword filter used a list of 20,000 keyphrases related to illnesses/diseases, symptoms, and treatments. 1 We removed retweets (marked with the “RT” tag) and tweets containing URLs; they were almost always false positives( e.g., news articles about the flu, rather than messages about a user’s health.) The resulting set contained 11.7 million tweets. Keyword filtering is insufficient since health keywords can be used in many contexts, e.g., “I’m sick of this” and “justin beber ur so cool and i have beber fever” (Culotta, 2010b). Instead, we obtain training data for a supervised classifier using Mechanical Turk (MTurk) (Callison-Burch and Dredze, 2010).We created a 5,128 tweet corpus labeled as related or unrelated to health. Turkers labeled tweets
منابع مشابه
A High-Performance Model based on Ensembles for Twitter Sentiment Classification
Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...
متن کاملComputational Content Analysis of Negative Tweets for Obesity, Diet, Diabetes, and Exercise
Social media based digital epidemiology has the potential to support faster response and deeper understanding of public health related threats. This study proposes a new framework to analyze unstructured health related textual data via Twitter users’ post (tweets) to characterize the negative health sentiments and non-health related concerns in relations to the corpus of negative sentiments; re...
متن کاملAn Empirical Comparison of Topics in Twitter and Traditional Media
Twitter as a new form of social media can potentially contain much useful information, but content analysis on Twitter has not been well studied. In particular, it is not clear whether as an information source Twitter can be simply regarded as a faster news feed that covers mostly the same information as traditional news media. In This paper we empirically compare the content of Twitter with a ...
متن کاملE-Cigarette Social Media Messages: A Text Mining Analysis of Marketing and Consumer Conversations on Twitter
BACKGROUND As the use of electronic cigarettes (e-cigarettes) rises, social media likely influences public awareness and perception of this emerging tobacco product. OBJECTIVE This study examined the public conversation on Twitter to determine overarching themes and insights for trending topics from commercial and consumer users. METHODS Text mining uncovered key patterns and important topi...
متن کاملMining Trending Hash Tags for Arabic Sentiment Analysis
People text millions of posts everyday on microblogging social networking especially Twitter which make microblogs a rich source for public opinions, customer’s comments and reviews. Companies and public sectors are looking for a way to measure the public response and feedback on particular service or product. Sentiment analysis is an encouraging technique capable to sense the public opinion in...
متن کامل