Building a treebank of noisy user-generated content: The French Social Media Bank
نویسندگان
چکیده
We introduce the French Social Media Bank, the first user-generated content treebank for French. Its first release contains 1,700 sentences from various Web 2.0 and social media sources (FACEBOOK, TWITTER, web forums), including data specifically chosen for their high noisiness.
منابع مشابه
The French Social Media Bank: a Treebank of Noisy User Generated Content
In recent years, statistical parsers have reached high performance levels on well-edited texts. Domain adaptation techniques have improved parsing results on text genres differing from the journalistic data most parsers are trained on. However, such corpora usually comply with standard linguistic, spelling and typographic conventions. In the meantime, the emergence of Web 2.0 communication medi...
متن کاملSemantically Enriched Machine Learning Approach to Filter YouTube Comments for Socially Augmented User Models
Social media are media for social interaction that allow creating and exchanging user-generated content. The massive social content can provide rich resources for deriving social profiles that can augment user models and improve adaptation in traditional applications. However, potentially valuable social contributions can be buried within highly noisy content that is irrelevant or spam. This pa...
متن کاملAugmenting User Models with Real World Experiences to Enhance Personalization and Adaptation
Social media are media for social interaction that allow creating and exchanging user-generated content. The massive social content can provide rich resources for deriving social profiles that can augment user models and improve adaptation in traditional applications. However, potentially valuable social contributions can be buried within highly noisy content that is irrelevant or spam. This pa...
متن کاملContent Strategy and Fan Engagement in Social Media The Case of PyeongChang Winter Olympic And Paralympic Games
Background. This paper investigates the pillars of content strategy and fan engagement in social networks during 2018 PyeongChang Winter Olympics and Paralympics. Objectives. The purpose of this paper is to seek reasons behind the differences in fan engagement in social media channels of PyeongChang Winter Olympics and Paralympics. Methods. Facebook and YouTube channels are used to analyze en...
متن کاملLithium NLP: A System for Rich Information Extraction from Noisy User Generated Text on Social Media
In this paper, we describe the Lithium Natural Language Processing (NLP) system a resource-constrained, highthroughput and language-agnostic system for information extraction from noisy user generated text on social media. Lithium NLP extracts a rich set of information including entities, topics, hashtags and sentiment from text. We discuss several real world applications of the system currentl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013