Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams

نویسندگان

  • David Graus
  • Manos Tsagkias
  • Lars Buitinck
  • Maarten de Rijke
چکیده

The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting the Image Propagation Pathin Online Social Networks

Content popularity prediction has been extensively studied due to its importance and interest for both users and hosts of social media sites like Facebook, Instagram, Twitter, and Pinterest. However, existing work mainly focuses on modeling popularity using a single metric such as the total number of likes or shares. In this work, we propose Diffusion-LSTM, a memory-based deep recurrent network...

متن کامل

Generating Pseudo Test Collections for Learning to Rank Scientific Articles

Pseudo test collections are automatically generated to provide training material for learning to rank methods. We propose a method for generating pseudo test collections in the domain of digital libraries, where data is relatively sparse, but comes with rich annotations. Our intuition is that documents are annotated to make them better findable for certain information needs. We use these annota...

متن کامل

Who Are Like-Minded: Mining User Interest Similarity in Online Social Networks

In this paper, we mine and learn to predict how similar a pair of users’ interests towards videos are, based on demographic (age, gender and location) and social (friendship, interaction and group membership) information of these users. We use the video access patterns of active users as ground truth (a form of benchmark). We adopt tag-based user profiling to establish this ground truth, and ju...

متن کامل

This Is My (Post) Truth, Tell Me Yours; Comment on “The Rise of Post-truth Populism in Pluralist Liberal Democracies: Challenges for Health Policy”

This is a commentary on the article ‘The rise of post-truth populism in pluralist liberal democracies: challenges for health policy.’ It critically examines two of its key concepts: populism and ‘post truth.’ This commentary argues that there are different types of populism, with unclear links to impacts, and that in some ways, ‘post-truth’ has resonances with arguments advanced in the period a...

متن کامل

Voting with Social Influence: Using Arguments to Uncover Ground Truth

In certain voting problems, a hidden ground truth is inferred by aggregating the opinions of an electorate. We propose a novel model of these underlying social interactions, and derive maximum likelihood estimators for the ground truth in these models, given the social network and votes. We also evaluate these new estimators, as well as existing ones, on a class of simulated social networks.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014