Mining Concurrent Topical Activity in Microblog Streams
نویسندگان
چکیده
Streams of user-generated content in social media exhibit patterns of collective attention across diverse topics, with temporal structures determined both by exogenous factors and endogenous factors. Teasing apart different topics and resolving their individual, concurrent, activity timelines is a key challenge in extracting knowledge from microblog streams. Facing this challenge requires the use of methods that expose latent signals by using term correlations across posts and over time. Here we focus on content posted to Twitter during the London 2012 Olympics, for which a detailed schedule of events is independently available and can be used for reference. We mine the temporal structure of topical activity by using two methods based on non-negative matrix factorization. We show that for events in the Olympics schedule that can be semantically matched to Twitter topics, the extracted Twitter activity timeline closely matches the known timeline from the schedule. Our results show that, given appropriate techniques to detect latent signals, Twitter can be used as a social sensor to extract topical-temporal information on realworld events at high temporal resolution.
منابع مشابه
Discovering Topical Aspects in Microblogs
We address the problem of discovering topical phrases or “aspects” from microblogging sites like Twitter, that correspond to key talking points or buzz around a particular topic or entity of interest. Inferring such topical aspects enables various applications such as trend detection and opinion mining for business analytics. However, mining high-volume microblog streams for aspects poses uniqu...
متن کاملFrequent Itemset Mining for Query Expansion in Microblog adhoc Search
The high volume of Tweets arriving every second and the requirement to index them in real time emphasize the importance of the computational complexity of algorithms used to process them. In this paper, we investigate the use of Frequent Itemsets Mining to quickly discover patterns that can later be used for query expansion. Frequent Itemsets Mining (FIM) has been highly adopted to mine data st...
متن کاملConcurrent Semi-supervised Learning of Data Streams
Conventional stream mining algorithms focus on single and stand-alone mining tasks. Given the single-pass nature of data streams, it makes sense to maximize throughput by performing multiple complementary mining tasks concurrently. We investigate the potential of concurrent semi-supervised learning on data streams and propose an incremental algorithm called CSL-Stream (Concurrent Semi–supervise...
متن کاملConcurrent Activity Recognition For Clinical Work
We present an approach to learning to recognize concurrent activities based on multiple data streams. One example is recognition of concurrent activities in hospital operating rooms based on multiple wearable and embedded sensors. This problem differs from standard time series classification in that there is no natural single target dimension, as multiple activities are performed at the same ti...
متن کاملMicroblog Entity Linking by Leveraging Extra Posts
Linking name mentions in microblog posts to a knowledge base, namely microblog entity linking, is useful for text mining tasks on microblog. Entity linking in long text has been well studied in previous works. However few work has focused on short text such as microblog post. Microblog posts are short and noisy. Previous method can extract few features from the post context. In this paper we pr...
متن کامل