revisit: a Workflow Tool for Data Science

نویسندگان

  • Norman Matloff
  • Reed Davis
  • Laurel Beckett
  • Paul Thompson
چکیده

In recent years there has been widespread concern in the scientific community over a reproducibility crisis. Among the major causes that have been identified is statistical: In many scientific research the statistical analysis (including data preparation) suffers from a lack of transparency and methodological problems, major obstructions to reproducibility. The revisit package aims toward remedying this problem, by generating a “software paper trail” of the statistical operations applied to a dataset. This record can be “replayed” for verification purposes, as well as be modified to enable alternative analyses. The software also issues warnings of certain kinds of potential errors in statistical methodology, again related to the reproducibility issue.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Flow Correctness in Adaptive Workflow Systems

Enterprises must be able to quickly adapt their business processes to react to changes in their environment. Needed business agility is often hindered by the lacking flexibility of contemporary workflow systems. In response to this inflexibility, adaptive workflow systems have emerged, which enable the dynamic adaptation of running workflows. One of the most important challenges in this context...

متن کامل

A Novel Assisted History Matching Workflow and its Application in a Full Field Reservoir Simulation Model

The significant increase in using reservoir simulation models poses significant challenges in the design and calibration of models. Moreover, conventional model calibration, history matching, is usually performed using a trial and error process of adjusting model parameters until a satisfactory match is obtained. In addition, history matching is an inverse problem, and hence it may have non-uni...

متن کامل

Designing a Provenance-Based Climate Data Analysis Application

Climate scientists have made substantial progress in understanding Earth’s climate system, particularly at global and continental scales. Climate research is now focused on understanding climate changes over wider ranges of time and space scales. These efforts are generating ultra-scale data sets at very high spatial resolution. An insightful analysis in climate science depends on using softwar...

متن کامل

Towards Quantification of Limits in Automated Curation of e-Science Data

Workflow systems are an increasingly popular eScience tool for executing complex sequences of tasks. The large volumes of data created during the course of these computationally intense and datadriven scientific investigations drives research in techniques to automate metadata capture to relieve the burden on the user of manual annotation. In this paper we describe our experience to date in qua...

متن کامل

Towards next Generation Provenance Systems for E-science towards next Generation Provenance Systems for E-science

e-Science helps scientists to automate scientific discovery processes and experiments, and promote collaboration across organizational boundaries and disciplines. These experiments involve data discovery, knowledge discovery, integration, linking, and analysis through different software tools and activities. Scientific workflow is one technique through which such activities and processes can be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1708.04789  شماره 

صفحات  -

تاریخ انتشار 2017