Flexible Provenance Tracing
نویسندگان
چکیده
The description of the origins of a piece of data and the transformations by which it arrived in a database is termed the data provenance. The importance of data provenance has already been widely recognized in database community. The two major approaches to representing provenance information use annotations and inversion. While annotation is metadata pre-computed to include the derivation history of a data product, the inversion method finds the source data based on the situation that some derivation process can be inverted. Annotations are flexible to represent diverse provenance metadata but the complete provenance data may outsize data itself. Inversion method is concise by using a single inverse query or function but the provenance needs to be computed on-the-fly. This paper proposes a new provenance representation which is a hybrid of annotation and inversion methods in order to achieve combined advantage. This representation is adaptive to the storage constraint and the response time requirement of provenance inversion on-the-fly. Shazia Sadiq The University of Queensland, Australia
منابع مشابه
RAMP: A System for Capturing and Tracing Provenance in MapReduce Workflows
RAMP (Reduce And Map Provenance) is an extension to Hadoop that supports provenance capture and tracing for workflows of MapReduce jobs. RAMP uses a wrapper-based approach, requiring little if any user intervention in most cases, while retaining Hadoop’s parallel execution and fault tolerance. We demonstrate RAMP on a real-world MapReduce workflow generated from a Pig script that performs senti...
متن کاملLogical Provenance in Data-Oriented Workflows∗ (Long Version)
We consider the problem of defining, generating, and tracing provenance in dataoriented workflows, in which input data sets are processed by a graph of transformations to produce output results. We first give a new general definition of provenance for general transformations, introducing the notions of correctness, precision, and minimality. We then determine when properties such as correctness...
متن کاملTracing where and who provenance in Linked Data: A calculus
Linked Data provides some sensible guidelines for publishing and consuming data on the Web. Data published on the Web has no inherent truth, yet its quality can often be assessed based on its provenance. This work introduces a new approach to provenance for Linked Data. The simplest notion of provenance – viz., a named graph indicating where the data is now – is extended with a richer provenanc...
متن کاملProvenance for Generalized Map and Reduce Workflows
We consider a class of workflows, which we call generalized map and reduce workflows (GMRWs), where input data sets are processed by an acyclic graph of map and reduce functions to produce output results. We show how data provenance (also sometimes called lineage) can be captured for map and reduce functions transparently. The captured provenance can then be used to support backward tracing (fi...
متن کاملProvenance and Case-Based Reasoning
Computational science takes a multidisciplinary approach to scientific investigation, tightly linking scientific research with computational studies and processes such as numerical simulation, data management, and visualization to study complex phenomena such as weather systems. The scientific importance of such processes has led to significant interest in recording the provenance of the data p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJSSOE
دوره 2 شماره
صفحات -
تاریخ انتشار 2011