Mining Concurrent Topical Activity in Microblog Streams

نویسندگان

  • André Panisson
  • Laetitia Gauvin
  • Marco Quaggiotto
  • Ciro Cattuto
چکیده

Streams of user-generated content in social media exhibit patterns of collective attention across diverse topics, with temporal structures determined both by exogenous factors and endogenous factors. Teasing apart different topics and resolving their individual, concurrent, activity timelines is a key challenge in extracting knowledge from microblog streams. Facing this challenge requires the use of methods that expose latent signals by using term correlations across posts and over time. Here we focus on content posted to Twitter during the London 2012 Olympics, for which a detailed schedule of events is independently available and can be used for reference. We mine the temporal structure of topical activity by using two methods based on non-negative matrix factorization. We show that for events in the Olympics schedule that can be semantically matched to Twitter topics, the extracted Twitter activity timeline closely matches the known timeline from the schedule. Our results show that, given appropriate techniques to detect latent signals, Twitter can be used as a social sensor to extract topical-temporal information on realworld events at high temporal resolution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering Topical Aspects in Microblogs

We address the problem of discovering topical phrases or “aspects” from microblogging sites like Twitter, that correspond to key talking points or buzz around a particular topic or entity of interest. Inferring such topical aspects enables various applications such as trend detection and opinion mining for business analytics. However, mining high-volume microblog streams for aspects poses uniqu...

متن کامل

Frequent Itemset Mining for Query Expansion in Microblog adhoc Search

The high volume of Tweets arriving every second and the requirement to index them in real time emphasize the importance of the computational complexity of algorithms used to process them. In this paper, we investigate the use of Frequent Itemsets Mining to quickly discover patterns that can later be used for query expansion. Frequent Itemsets Mining (FIM) has been highly adopted to mine data st...

متن کامل

Concurrent Semi-supervised Learning of Data Streams

Conventional stream mining algorithms focus on single and stand-alone mining tasks. Given the single-pass nature of data streams, it makes sense to maximize throughput by performing multiple complementary mining tasks concurrently. We investigate the potential of concurrent semi-supervised learning on data streams and propose an incremental algorithm called CSL-Stream (Concurrent Semi–supervise...

متن کامل

Concurrent Activity Recognition For Clinical Work

We present an approach to learning to recognize concurrent activities based on multiple data streams. One example is recognition of concurrent activities in hospital operating rooms based on multiple wearable and embedded sensors. This problem differs from standard time series classification in that there is no natural single target dimension, as multiple activities are performed at the same ti...

متن کامل

Microblog Entity Linking by Leveraging Extra Posts

Linking name mentions in microblog posts to a knowledge base, namely microblog entity linking, is useful for text mining tasks on microblog. Entity linking in long text has been well studied in previous works. However few work has focused on short text such as microblog post. Microblog posts are short and noisy. Previous method can extract few features from the post context. In this paper we pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014