Provenance, Lineage, and Workflows
نویسندگان
چکیده
In Computer Science, Provenance also known as lineage and pedigree describe the source and derivation of data. Data provenance is key to the management of scientific data and has recently been recognized as central to the trust one places in data. This paper focus attention on the importance and difficulty of provenance tracking in practice. We discuss a taxonomy of data provenance characteristics and focus primarily on scientific workflow approaches.
منابع مشابه
Provenance for Generalized Map and Reduce Workflows
We consider a class of workflows, which we call generalized map and reduce workflows (GMRWs), where input data sets are processed by an acyclic graph of map and reduce functions to produce output results. We show how data provenance (also sometimes called lineage) can be captured for map and reduce functions transparently. The captured provenance can then be used to support backward tracing (fi...
متن کاملAddressing Underspecified Lineage Queries on Provenance
State-of-the-art provenance systems accumulate data over time, creating deep lineage trees. When queried for the lineage of an object, these systems can return excessive results due to the longevity and depth of their provenance. Such a query is underspecified: it does not constrain its result to a finite span of history. Unfortunately, specifying queries correctly often requires in-depth knowl...
متن کاملProvenance Collection Support in the Kepler Scientific Workflow System
In many data-driven applications, analysis needs to be performed on scientific information obtained from several sources and generated by computations on distributed resources. Systematic analysis of this scientific information unleashes a growing need for automated data-driven applications that also can keep track of the provenance of the data and processes with little user interaction and ove...
متن کاملA Provenance-Integration Framework for Distributed Workflows in Grid Environments
Provenance information about complex and distributed workflows is a key issue for data quality control and data reliability maintenance in reservoir management. Distributed and integrated environments where different workflows consume and transform data require a comprehensive provenance view. In this scenario provenance collection and integration presents significant challenges. In this paper,...
متن کاملIntelligent Workflow Systems and Provenance-Aware Software
Workflows are increasingly used in science to manage complex computations and data processing at large scale. Intelligent workflow systems provide assistance in setting up parameters and data, validating workflows created by users, and automating the generation of workflows from high-level user guidance. These systems use semantic workflows that extend workflow representations with semantic con...
متن کامل