Large-Scale Knowledge Graph Identification using PSL
نویسندگان
چکیده
Building a web-scale knowledge graph, which captures information about entities and the relationships between them, represents a formidable challenge. While many largescale information extraction systems operate on web corpora, the candidate facts they produce are noisy and incomplete. To remove noise and infer missing information in the knowledge graph, we propose knowledge graph identification: a process of jointly reasoning about the structure of the knowledge graph, utilizing extraction confidences and leveraging ontological information. Scalability is often a challenge when building models in domains with rich structure, but we use probabilistic soft logic (PSL), a recentlyintroduced probabilistic modeling framework which easily scales to millions of facts. In practice, our method performs joint inference on a real-world dataset containing over 1M facts and 80K ontological constraints in 12 hours and produces a high-precision set of facts for inclusion into a knowledge graph.
منابع مشابه
Knowledge Graph Identification
Large-scale information processing systems are able to extract massive collections of interrelated facts, but unfortunately transforming these candidate facts into useful knowledge is a formidable challenge. In this paper, we show how uncertain extractions about entities and their relations can be transformed into a knowledge graph. The extractions form an extraction graph and we refer to the t...
متن کاملLarge-Scale Knowledge Graph Identification using PSL Extended Abstract
The web is a vast repository of knowledge, but automatically extracting that knowledge, at scale, has proven to be a formidable challenge. A number of recent evaluation efforts have focused on automatic knowledge base population (Ji, Grishman, and Dang 2011; Artiles and Mayfield 2012), and many well-known broad domain and open information extraction systems exist, including the Never-Ending Lan...
متن کاملLPKP: location-based probabilistic key pre-distribution scheme for large-scale wireless sensor networks using graph coloring
Communication security of wireless sensor networks is achieved using cryptographic keys assigned to the nodes. Due to resource constraints in such networks, random key pre-distribution schemes are of high interest. Although in most of these schemes no location information is considered, there are scenarios that location information can be obtained by nodes after their deployment. In this paper,...
متن کاملA partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کاملON THE SZEGED INDEX OF NON-COMMUTATIVE GRAPH OF GENERAL LINEAR GROUP
Let $G$ be a non-abelian group and let $Z(G)$ be the center of $G$. Associate with $G$ there is agraph $Gamma_G$ as follows: Take $Gsetminus Z(G)$ as vertices of$Gamma_G$ and joint two distinct vertices $x$ and $y$ whenever$yxneq yx$. $Gamma_G$ is called the non-commuting graph of $G$. In recent years many interesting works have been done in non-commutative graph of groups. Computing the clique...
متن کامل