Building a treebank of noisy user-generated content: The French Social Media Bank

نویسندگان

  • Djamé Seddah
  • Benoît Sagot
  • Marie Candito
  • Virginie Mouilleron
  • Vanessa Combet
چکیده

We introduce the French Social Media Bank, the first user-generated content treebank for French. Its first release contains 1,700 sentences from various Web 2.0 and social media sources (FACEBOOK, TWITTER, web forums), including data specifically chosen for their high noisiness.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The French Social Media Bank: a Treebank of Noisy User Generated Content

In recent years, statistical parsers have reached high performance levels on well-edited texts. Domain adaptation techniques have improved parsing results on text genres differing from the journalistic data most parsers are trained on. However, such corpora usually comply with standard linguistic, spelling and typographic conventions. In the meantime, the emergence of Web 2.0 communication medi...

متن کامل

Semantically Enriched Machine Learning Approach to Filter YouTube Comments for Socially Augmented User Models

Social media are media for social interaction that allow creating and exchanging user-generated content. The massive social content can provide rich resources for deriving social profiles that can augment user models and improve adaptation in traditional applications. However, potentially valuable social contributions can be buried within highly noisy content that is irrelevant or spam. This pa...

متن کامل

Augmenting User Models with Real World Experiences to Enhance Personalization and Adaptation

Social media are media for social interaction that allow creating and exchanging user-generated content. The massive social content can provide rich resources for deriving social profiles that can augment user models and improve adaptation in traditional applications. However, potentially valuable social contributions can be buried within highly noisy content that is irrelevant or spam. This pa...

متن کامل

Content Strategy and Fan Engagement in Social Media The Case of PyeongChang Winter Olympic And Paralympic Games

Background. This paper investigates the pillars of content strategy and fan engagement in social networks during 2018 PyeongChang Winter Olympics and Paralympics. Objectives. The purpose of this paper is to seek reasons behind the differences in fan engagement in social media channels of PyeongChang Winter Olympics and Paralympics. Methods. Facebook and YouTube channels are used to analyze en...

متن کامل

Lithium NLP: A System for Rich Information Extraction from Noisy User Generated Text on Social Media

In this paper, we describe the Lithium Natural Language Processing (NLP) system a resource-constrained, highthroughput and language-agnostic system for information extraction from noisy user generated text on social media. Lithium NLP extracts a rich set of information including entities, topics, hashtags and sentiment from text. We discuss several real world applications of the system currentl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013