Reproducibility Analysis of Scientific Workflows
نویسندگان
چکیده
Scientific workflows are efficient tools for specifying and automating compute and data intensive in-silico experiments. An important challenge related to their usage is their reproducibility. In order to make it reproducible, many factors have to be investigated which can influence and even prevent this process: the missing descriptions and samples; the missing provenance data about the environmental parameters and the data dependencies; the dependencies of executions which are based on special hardware, changing or volatile third party services or random generated values. Some of these factors (called dependencies) can be eliminated by careful design or by huge resource usage but most of them cannot be bypassed. Our investigation deals with the critical dependencies of execution. In this paper we set up a mathematical model to evaluate the results of the workflow in addition we provide a mechanism to make the workflow reproducible based on provenance data and statistical tools.
منابع مشابه
Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities
With the development of new experimental technologies, biologists are faced with an avalanche of data to be computationally analyzed for scientific advancements and discoveries to emerge. Faced with the complexity of analysis pipelines, the large number of computational tools, and the enormous amount of data to manage, there is compelling evidence that many if not most scientific discoveries wi...
متن کاملFacilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications
Scientific workflows are designed to solve complex scientific problems and accelerate scientific progress. Ideally, scientific workflows should improve the reproducibility of scientific applications by making it easier to share and reuse workflows between scientists. However, scientists often find it difficult to reuse others’ workflows, which is known as workflow decay. In this paper, we explo...
متن کاملReadable workflows need simple data
Sharing scientific analyses via workflows has the potential to improve the reproducibility of research results as they allow complex tasks to be split into smaller pieces and give a visual access to the flow of data between the components of an analysis. This is particularly useful for trans-disciplinary research fields such as biodiversity and ecosystem functioning (BEF), where complex synthes...
متن کاملHi-WAY: Execution of Scientific Workflows on Hadoop YARN
Scientific workflows provide a means to model, execute, and exchange the increasingly complex analysis pipelines necessary for today’s data-driven science. However, existing scientific workflow management systems (SWfMSs) are often limited to a single workflow language and lack adequate support for large-scale data analysis. On the other hand, current distributed dataflow systems are based on a...
متن کاملA Calculus for Propagating Semantic Annotations Through Scientific Workflow Queries
Scientific workflows facilitate automation, reuse, and reproducibility of scientific data management and analysis tasks. Scientific workflows are often modeled as dataflow networks, chaining together processing components (called actors) that query, transform, analyse, and visualize scientific datasets. Semantic annotations relate data and actor schemas with conceptual information from a shared...
متن کاملUtilising Semantic Web Ontologies To Publish Experimental Workflows
Reproducibility in experiments is necessary to verify claims and to reuse prior work in experiments that advance research. However, the traditional model of publication validates research claims through peer-review without taking reproducibility into account. Workflows encap‐ sulate experiment descriptions and components and are suitable for repre‐ senting reproducibility. Additionally, they ca...
متن کامل