Using Provenance for Repeatability
نویسندگان
چکیده
We present Provenance-To-Use (PTU), a tool that minimizes computation time during repeatability testing. Authors can use PTU to build a package that includes their software program and a provenance trace of an initial reference execution. Testers can select a subset of the package’s processes for a partial deterministic replay—based, for example, on their compute, memory and I/O utilization as measured during the reference execution. Using the provenance trace, PTU guarantees that events are processed in the same order using the same data from one execution to the next. We show the efficiency of PTU for conducting repeatability testing of workflow-based scientific programs.
منابع مشابه
The Path to Virtual Machine Images as First Class Provenance
The scientific community’s increased exposure to cloud computing has led to increased familiarity with the machine virtualization technology that underpins the cloud. Efforts to define and implement provenance for the cloud are under way. In the meantime, however, an orthogonal idea, aimed at quickly facilitating repeatability and curation, has taken shape. This is the idea of using virtual mac...
متن کاملProvenance, XML, and the Scientific Web
Science is now being revolutionized by the capabilities of distributing computation and human effort over the World Wide Web. This revolution offers dramatic benefits but also poses serious risks due to the fluid nature of digital information. The Web today does not provide adequate repeatability, reliability, accountability and trust guarantees for scientific applications. One important part o...
متن کاملProvenance Management for SPARQL Updates
During the last few years we have witnessed an explosion in the publication of data in the Web, mainly in the form of Linked Data. Scienti c, corporate or even governmental data are made available for open access and used by applications, individual users and communities. Given the increasing amount and the heterogeneity of this data, it is of crucial importance to be able to track its provenan...
متن کاملUsing Cloud-Aware Provenance to Reproduce Scientific Workflow Execution on Cloud
Provenance has been thought of a mechanism to verify a workflow and to provide workflow reproducibility. This provenance of scientific workflows has been effectively carried out in Grid based scientific workflow systems. However, recent adoption of Cloud-based scientific workflows present an opportunity to investigate the suitability of existing approaches or propose new approaches to collect p...
متن کاملAutomatically Tracking Metadata and Provenance of Machine Learning Experiments
We present a lightweight system to extract, store and manage metadata and provenance information of common artifacts in machine learning (ML) experiments: datasets, models, predictions, evaluations and training runs. Our system accelerates users in their ML workflow, and provides a basis for comparability and repeatability of ML experiments. We achieve this by tracking the lineage of produced a...
متن کامل