Efficient Scheduling and Execution of Scientific Workflow Tasks
نویسندگان
چکیده
Large-scale scientific workflows are often characterized by tasks that produce or consume large amounts of data (frequently both) and generate large volumes of derived data products. Minimizing the end-to-end running time of a set of workflow tasks is important to deliver data products in a timely manner and free up processors to accomodate additional workflows. A single workflow task may perform the same computations on multiple files, presenting many opportunities for concurrent execution on multiple nodes of a Grid. In addition, many different tasks may operate on the same large input files. An important challenge to efficient workflow execution on multiple nodes is determining an assignment of tasks to nodes. Processor and network speeds may vary at different times, workflow tasks may be modified, and new workflows may be added. In this paper we examine algorithms for scheduling tasks concurrently on nodes of a dedicated Grid to address these challenges. We use real workflow tasks from the CORIE Environmental Observation and Forecasting System. We propose a hybrid scheduling approach that exploits knowledge of task running times and locations of input files to assign some tasks to nodes statically, while others are assigned dynamically to adapt to variations in task execution times. We show the effectiveness of our approach using both simulations and our prototype implementation.
منابع مشابه
A Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints
One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...
متن کاملScore Based Budget Constraint Workflow Scheduling Algorithm for Cloud System
Cloud Computing is the technology that provides on demand services and resources like storage space, networks, programming language execution environment on the top of Internet using pay as you go model. The concept of Cloud Computing emerging as a latest model of service provisioning in distributed system encourage researchers to investigate its advantages and drawbacks in executing scientific...
متن کاملArchitectural Plan for Constructing Fault Tolerable Workflow Engines Based on Grid Service
In this paper the design and implementation of fault tolerable architecture for scientific workflow engines is presented. The engines are assumed to be implemented as composite web services. Current architectures for workflow engines do not make any considerations for substituting faulty web services with correct ones at run time. The difficulty is to rollback the execution state of the workflo...
متن کاملData - Aware Workflow Scheduling in Heterogeneous Distributed Systems
Data transferring in scientific workflows gradually attracts more attention due to large amounts of data generated by complex scientific workflows will significantly increase the turnaround time of the whole workflow. It is almost impossible to make an optimal or approximate optimal scheduling for the end-to-end workflow without considering the intermediate data movement. In order to reduce the...
متن کاملMulti-objective and Scalable Heuristic Algorithm for Workflow Task Scheduling in Utility Grids
To use services transparently in a distributed environment, the Utility Grids develop a cyber-infrastructure. The parameters of the Quality of Service such as the allocation-cost and makespan have to be dealt with in order to schedule workflow application tasks in the Utility Grids. Optimization of both target parameters above is a challenge in a distributed environment and may conflict one an...
متن کامل