Optimizing workflow data footprint
نویسندگان
چکیده
In this paper we examine the issue of optimizing disk usage and scheduling large-scale scientific workflows onto distributed resources where the workflows are data-intensive, requiring large amounts of data storage, and the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer needed and we demonstrate that workflows may have to be restructured to reduce the overall data footprint of the workflow. We show the results of our data management and workflow restructuring solutions using a Laser Interferometer Gravitational-Wave Observatory (LIGO) application and an astronomy application, Montage, running on a large-scale production grid─the Open Science Grid. We show that although reducing the data footprint of Montage by 48% can be achieved with dynamic data cleanup techniques, LIGO Scientific Collaboration workflows require additional restructuring to achieve a 56% reduction in data space usage. We also examine the cost of the workflow restructuring in terms of the application’s runtime.
منابع مشابه
Uncertainty Footprint: Visualization of Nonuniform Behavior of Iterative Algorithms Applied to 4D Cell Tracking
Research on microscopy data from developing biological samples usually requires tracking individual cells over time. When cells are three-dimensionally and densely packed in a time-dependent scan of volumes, tracking results can become unreliable and uncertain. Not only are cell segmentation results often inaccurate to start with, but it also lacks a simple method to evaluate the tracking outco...
متن کاملOptimizing a CORBA IIOP Protocol Engine for Minimal Footprint Multimedia Systems
Communication software for hand-held devices must be flexible and efficient to deliver the necessary Quality of Service (QoS) to multimedia applications such as real-time audio and video, video on-demand, electronic mail and fax, and Internet telephony. CORBA Object Request Brokers (ORBs) are an emerging middleware standard targeted for distributed applications. The stringent memory constraints...
متن کاملLabelFlow: Exploiting Workflow Provenance to Surface Scientific Data Provenance
Provenance traces captured by scientific workflows can be useful for designing, debugging and maintenance. However, our experience suggests that they are of limited use for reporting results, in part because traces do not comprise domain-specific annotations needed for explaining results, and the black-box nature of some workflow activities. We show that by basic mark-up of the data processing ...
متن کاملGreen Cloud: Smart Resource Allocation and Optimization using Simulated Annealing Technique
Cloud computing aims to offer utility based IT services by interconnecting large number of computers through a real-time communication network such as the Internet. There has been a significant increase in the power consumption by the data centres that host the Cloud applications because of the growing popularity of Cloud Computing in more and more organisations involved in various fields. Henc...
متن کاملA Framework for Optimizing Distributed Workflow Executions
A central problem in workflow concerns optimizing the distribution of work in a workflow: how should the execution of tasks and the management of tasks be distributed across multiple processing nodes (i.e., computers). In some cases task management or execution may be at a processing node with limited functionality, and so it is useful to optimize translations of (sub-)workflow schemas into flo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Scientific Programming
دوره 15 شماره
صفحات -
تاریخ انتشار 2007