Generating examples of paths summarizing RDF datasets

نویسندگان

  • Jindrich Mynarz
  • Marek Dudás
  • Paolo Tomeo
  • Vojtech Svátek
چکیده

As datasets become too large to be comprehended directly, a need for data summarization arises. A data summary can present typical patterns commonly found in a dataset, from which high-level understanding of the data can be obtained. Nonetheless, such abstract understanding can be improved by providing concrete examples of the summary patterns. If possible, the chosen examples should be diverse and representative of the patterns they instantiate. In this paper, we present three methods for generating examples of patterns discovered in RDF datasets. The patterns we consider are the most frequent path graphs that consist of classes of instances or data types of literals connected by RDF properties. We propose an RDF/S vocabulary for describing these path graphs and their instances. We present three methods for generating path examples, namely random, distinct, and representative selection, that are based on randomization, diversification, and clustering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Aether - Generating and Viewing Extended VoID Statistical Descriptions of RDF Datasets

This paper presents the Aether web application for generating, viewing and comparing extended VoID statistical descriptions of RDF datasets. The tool is useful for example in getting to know a newly encountered dataset, in comparing datasets between versions and in detecting outliers and errors. Examples are given on how the tool has been used to shed light on multiple important datasets.

متن کامل

Generating RDF for Application Testing

Application testing is a critical component of application development. Testing of Semantic Web applications requires large RDF datasets, conforming to an expected form or schema, and preferably, to an expected data distribution. Finding such datasets often proves impossible, while generating input datasets is often cumbersome. The GRR (Generating Random RDF) system is a convenient, yet powerfu...

متن کامل

On the outer independent 2-rainbow domination number of Cartesian products of paths and cycles

‎Let G be a graph‎. ‎A 2-rainbow dominating function (or‎ 2-RDF) of G is a function f from V(G)‎ ‎to the set of all subsets of the set {1,2}‎ ‎such that for a vertex v ∈ V (G) with f(v) = ∅, ‎the‎‎condition $bigcup_{uin N_{G}(v)}f(u)={1,2}$ is fulfilled‎, wher NG(v)  is the open neighborhood‎‎of v‎. ‎The weight of 2-RDF f of G is the value‎‎$omega (f):=sum _{vin V(G)}|f(v)|$‎. ‎The 2-rainbow‎‎d...

متن کامل

Top-K Shortest Paths in Large Typed RDF Datasets Challenge

Perhaps the most widely appreciated linked data principle is the one that instructs linked data providers to provide useful information using the standards (i.e., RDF and SPARQL). Such information corresponds to patterns expressed as SPARQL queries that are matched against the RDF graph. Until recently, it was not possible to create a pattern without specifying the exact path that would match a...

متن کامل

RDF-3X: a RISC-style engine for RDF

RDF is a data representation format for schema-free structured information that is gaining momentum in the context of Semantic-Web corpora, life sciences, and also Web 2.0 platforms. The “pay-as-you-go” nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016