Tweet Data mining : the Cultural Microblog Contextualization Data Set
نویسندگان
چکیده
This paper presents an overview of the data set that was used for the Cultural Microblog Contextualization Workshop at CLEF 2016 and more specifically for the task 1: tweet contextualization. In this paper we first present a descriptive analysis of the data: we consider the variables or features associated with the tweets and analyse them. Then we also analyse the tweet textual content. The results of this work correspond to a first step toward data quality checking. It can also useful in order to understand better the data and its usefulness for some tasks or case studies.
منابع مشابه
Building a Knowledge Base using Microblogs: the Case of Cultural MicroBlog Contextualization Collection
The Cultural MicroBlog Contextualization (CMC) Workshop provides a collection of tweets on cultural events related to festivals. Given the size of a tweet, the information obtained by a single post is often very partial. We develop the idea that using a set of tweets about an event could enable having a more complete view of that event by combining all information posted. In this paper, we prop...
متن کاملCLEF 2017 Microblog Cultural Contextualization Content Analysis task Overview
The MC2 CLEF 2017 Content Analysis task deals with classification, filtering, language recognition, localization, entity extraction, linking open data, and summarization. Festivals have a large presence on social media. The resulting microblog stream and related URLs are appropriate to experiment on advanced social media search and mining methods. For content analysis, topics were in any langua...
متن کاملIITH at CLEF 2017: Finding Relevant Tweets for Cultural Events
Retrieving relevant tweets corresponding to cultural events can be used in various applications like event reporting, event recommendation, etc. This type of retrieval is challenging due to short length of the tweet, noise, out of vocabulary words, abbreviations in the tweet. In this paper, we focus on the problem of retrieving relevant tweets related to given cultural event of a festival. We c...
متن کاملTweet Contextualization using Continuous Space Vectors: Automatic Summarization of Cultural Documents
In this paper we describe our participation in the INEX 2016 Tweet Contextualization track. The tweet contextualization process aims at generating a short summary from Wikipedia documents related to the tweet. In our approach, we analyzed tweets and created a query to retrieve the most relevant Wikipedia article. We combine Information Retrieval and Automatic Text Summarization methods to gener...
متن کاملINEX2014: Tweet Contextualization Using Association Rules between Terms
Tweets are short messages that do not exceed 140 characters. Since they must be written respecting this limitation, a particular vocabulary is used. To make them understandable to a reader, it is therefore necessary to know their context. In this paper, we describe our approach submitted for the tweet contextualization track in CLEF 2014 (Conference and Labs of Evaluation Forums). This approach...
متن کامل