Supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPC systems
نویسنده
چکیده
Efficient use and high output of any supercomputer depends on a great number of factors. The problem of controlling granted resource utilization is one of those, and becomes especially noticeable in conditions of concurrent work of many user projects. It is important to provide users with detailed information on peculiarities of their executed jobs. At the same time it is important to provide project managers with detailed information on resource utilization by project members by giving access to the detailed job analysis. Unfortunately, such information is rarely available. This gap should be eliminated with our proposed approach to supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPC systems based on system monitoring data management and study, building integral job characteristics, revealing job categories and single job run peculiarities.
منابع مشابه
The Self-Tuning dynP Job-Scheduler
In modern resource management systems for supercomputers and HPC-clusters the job-scheduler plays a major role in improving the performance and usability of the system. The performance of the used scheduling policies (e.g. FCFS, SJF, LJF) depends on the characteristics of the queued jobs. Hence we developed the dynP scheduler family. The basic idea was to change between different scheduling pol...
متن کاملBSLD Threshold Driven Parallel Job Scheduling for Energy Efficient HPC centers
Recently, power awareness in high performance computing (HPC) community has increased significantly. While CPU power reduction of HPC applications using Dynamic Voltage Frequency Scaling (DVFS) has been explored thoroughly, CPU power management for large scale parallel systems at system level has left unexplored. In this paper we propose a power-aware parallel job scheduler assuming DVFS enable...
متن کاملSIMULATION OF HPC JOB SCHEDULING AND LARGE - SCALE PARALLEL WORKLOADS Mohammad
The paper presents a simulator designed specifically for evaluating job scheduling algorithms on large-scale HPC systems. The simulator was developed based on the Performance Prediction Toolkit (PPT), which is a parallel discrete-event simulator written in Python for rapid assessment and performance prediction of large-scale scientific applications on supercomputers. The proposed job scheduler ...
متن کاملThe Case For Colocation of HPC Workloads
The current state of practice in supercomputer resource allocation places jobs from different users on disjoint nodes both in terms of time and space. While this approach largely guarantees that jobs from different users do not degrade one another’s performance, it does so at high cost to system throughput and energy efficiency. This focused study presents job striping, a technique that signifi...
متن کاملA proactive fault tolerance framework for high performance computing (HPC) systems in the cloud
As high-performance computing (HPC) systems continue to increase in scale, their mean-time to interrupt decreases respectively. The current state of practice for fault tolerance (FT) is checkpoint/restart. However, with increasing error rates, increasing aggregate memory and not proportionally increasing I/O capabilities, it is becoming less efficient. Proactive FT avoids experiencing failures ...
متن کامل