Adaptive and Virtual Reconfigurations for Effective Dynamic Job Scheduling in Cluster Systems

نویسندگان

  • Songqing Chen
  • Li Xiao
  • Xiaodong Zhang
چکیده

In a cluster system with dynamic load sharing support, a job submission or migration to a workstation is determined by the availability of CPU and memory resources of the workstation at the time [3]. In such a system, a small number of running jobs with unexpectedly large memory allocation requirements may significantly increase the queuing delay times of the rest of jobs with normal memory requirements, slowing down executions of individual jobs and decreasing the system throughput. We call this phenomenon as the job blocking problem because the big jobs block the execution pace of majority jobs in the cluster. Since the memory demand of jobs may not be known in advance and may change dynamically, the possibility of unsuitable job submissions/migrations to cause the blocking problem is high, and the existing load sharing schemes are unable to effectively handle this problem. We propose a software method incorporating with dynamic load sharing, which adaptively reserves a small set of workstations through virtual cluster reconfiguration to provide special services to the jobs demanding large memory allocations. This policy implies the principle of shortest-remainingprocessing-time policy. As soon as the blocking problem is resolved by the reconfiguration, the system will adaptively switch back to the normal load sharing state. We present three contributions in this study. (1) we quantitatively present the conditions to cause the job blocking problem. (2) We present the adaptive software method in a dynamic load sharing system. We show the adaptive process causes little additional overhead. (3) Conducting trace-driven simulations, we show that our method can effectively improve the cluster computing performance by quickly resolving the job blocking problem. The effectiveness and performance insights are also analytically verified.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid

Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...

متن کامل

Performance Evaluation of Adaptive Scheduling Algorithm for Shared Heterogeneous Cluster Systems

Cluster computing systems have recently generated enormous interest for providing easily scalable and cost-effective parallel computing solution for processing large-scale applications. Various adaptive space-sharing scheduling algorithms have been proposed to improve the performance of dedicated and homogeneous clusters. But commodity clusters are naturally nondedicated and tend to be heteroge...

متن کامل

Coscheduling on Cluster Systems

Coordinated scheduling of parallel jobs across the nodes of a multiprocessor system is known to produce benefits in both system and individual job efficiency. Without coordinated scheduling, the processes constituting a parallel job would suffer high communication latencies because of processor thrashing. With clusters connected by highperformance networks that achieve latencies in the range of...

متن کامل

Dynamic Resource Management and Job Scheduling for High Performance Computing = Dynamisches Ressourcenmanagement und Job-Scheduling für das Hochleistungsrechnen

Job scheduling and resource management plays an essential role in high-performance computing. Supercomputing resources are usually managed by a batch system, which is responsible for the effective mapping of jobs onto resources (i.e., compute nodes). From the system perspective, a batch system must ensure high system utilization and throughput, while from the user perspective it must ensure fas...

متن کامل

Efficient Algorithms for Just-In-Time Scheduling on a Batch Processing Machine

Just-in-time scheduling problem on a single batch processing machine is investigated in this research. Batch processing machines can process more than one job simultaneously and are widely used in semi-conductor industries. Due to the requirements of just-in-time strategy, minimization of total earliness and tardiness penalties is considered as the criterion. It is an acceptable criterion for b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002