Learning to Extract Events from Knowledge Base Revisions

نویسندگان

  • Alexander Konovalov
  • Benjamin Strauss
  • Alan Ritter
  • Brendan T. O'Connor
چکیده

Broad-coverage knowledge bases (KBs) such as Wikipedia, Freebase, Microsoft’s Satori and Google’s Knowledge Graph contain structured data describing real-world entities. These data sources have become increasingly important for a wide range of intelligent systems: from information retrieval and question answering, to Facebook’s Graph Search, IBM’s Watson, and more. Previous work on learning to populate knowledge bases from text has, for the most part, made the simplifying assumption that facts remain constant over time. But this is inaccurate—we live in a rapidly changing world. Knowledge should not be viewed as a static snapshot, but instead a rapidly evolving set of facts that must change as the world changes. In this paper we demonstrate the feasibility of accurately identifying entity-transition-events, from real-time news and social media text streams, that drive changes to a knowledge base. We use Wikipedia’s edit history as distant supervision to learn event extractors, and evaluate the extractors based on their ability to predict online updates. Our weakly supervised event extractors are able to predict 10 KB revisions per month at 0.8 precision. By lowering our confidence threshold, we can suggest 34.3 correct edits per month at 0.4 precision. 64% of predicted edits were detected before they were added to Wikipedia. The average lead time of our forecasted knowledge revisions over Wikipedia’s editors is 40 days, demonstrating the utility of our method for suggesting edits that can be quickly verified and added to the knowledge graph.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Appears in the Working Notes of the IJCAI-97 Workshop on Abduction and Induction in AI Integrating Abduction and Induction in Machine Learning

This paper discusses the integration of traditional abductive and inductive reasoning methods in the development of machine learning systems. In particular, the paper discusses our recent work in two areas: 1) The use of traditional abductive methods to propose revisions during theory reene-ment, where an existing knowledge base is modiied to make it consistent with a set of empirical data; and...

متن کامل

Integrating Abduction and Induction in Machine Learning

This paper discusses the integration of traditional abductive and inductive reasoning methods in the development of machine learning systems. In particular, the paper discusses our recent work in two areas: 1) The use of traditional abductive methods to propose revisions during theory re nement, where an existing knowledge base is modi ed to make it consistent with a set of empirical data; and ...

متن کامل

Appears in Abduction and Induction

This article discusses the integration of traditional abductive and inductive reasoning methods in the development of machine learning systems. In particular, it reviews our recent work in two areas: 1) The use of traditional abductive methods to propose revisions during theory re nement, where an existing knowledge base is modi ed to make it consistent with a set of empirical data; and 2) The ...

متن کامل

Detecting and correcting errors in rule-based expert systems: an integration of empirical and explanation- based learning

In this paper, we argue that techniques proposed for combining empirical and explanation-based learning methods can also be used to detect errors in rule-based expert systems, to isolate the blame for these errors to a small number of rules and suggest revisions to the rules to eliminate these errors. We demonstrate that FOCL, an extension to Quinlan’s FOIL program, can learn relational concept...

متن کامل

Automatically Labeled Data Generation for Large Scale Event Extraction

Modern models of event extraction for tasks like ACE are based on supervised learning of events from small hand-labeled data. However, hand-labeled training data is expensive to produce, in low coverage of event types, and limited in size, which makes supervised methods hard to extract large scale of events for knowledge base population. To solve the data labeling problem, we propose to automat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017