Provenance, Lineage, and Workflows

نویسندگان

  • Sidra Islam
  • Stanley B. Zdonik
چکیده

In Computer Science, Provenance also known as lineage and pedigree describe the source and derivation of data. Data provenance is key to the management of scientific data and has recently been recognized as central to the trust one places in data. This paper focus attention on the importance and difficulty of provenance tracking in practice. We discuss a taxonomy of data provenance characteristics and focus primarily on scientific workflow approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Provenance for Generalized Map and Reduce Workflows

We consider a class of workflows, which we call generalized map and reduce workflows (GMRWs), where input data sets are processed by an acyclic graph of map and reduce functions to produce output results. We show how data provenance (also sometimes called lineage) can be captured for map and reduce functions transparently. The captured provenance can then be used to support backward tracing (fi...

متن کامل

Addressing Underspecified Lineage Queries on Provenance

State-of-the-art provenance systems accumulate data over time, creating deep lineage trees. When queried for the lineage of an object, these systems can return excessive results due to the longevity and depth of their provenance. Such a query is underspecified: it does not constrain its result to a finite span of history. Unfortunately, specifying queries correctly often requires in-depth knowl...

متن کامل

Provenance Collection Support in the Kepler Scientific Workflow System

In many data-driven applications, analysis needs to be performed on scientific information obtained from several sources and generated by computations on distributed resources. Systematic analysis of this scientific information unleashes a growing need for automated data-driven applications that also can keep track of the provenance of the data and processes with little user interaction and ove...

متن کامل

A Provenance-Integration Framework for Distributed Workflows in Grid Environments

Provenance information about complex and distributed workflows is a key issue for data quality control and data reliability maintenance in reservoir management. Distributed and integrated environments where different workflows consume and transform data require a comprehensive provenance view. In this scenario provenance collection and integration presents significant challenges. In this paper,...

متن کامل

Intelligent Workflow Systems and Provenance-Aware Software

Workflows are increasingly used in science to manage complex computations and data processing at large scale. Intelligent workflow systems provide assistance in setting up parameters and data, validating workflows created by users, and automating the generation of workflows from high-level user guidance. These systems use semantic workflows that extend workflow representations with semantic con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010