Towards Proactive Network Load Management for Distributed Parallel Programming
نویسندگان
چکیده
In order to increase the overall performance of distributed parallel programs running in a network of non-dedicated workstations, we have researched methods for improving load balancing in loosely coupled heterogeneous distributed systems. Current software designed to handle distributed applications does not focus on the problem of forecasting the computers future load. The software only dispatches the tasks assigning them either to an idle CPU (in dedicated networks) or to the lowest loaded one (in non-dedicated networks). Our approach tries to improve the standard dispatching strategies provided by both parallel languages and libraries, by implementing new dispatching criteria. It will choose the most suitable computer after forecasting the load of the individual machines based on current and historical data. Existing applications could take advantage of this new service with no extra changes but a recompilation. A fair comparison between different dispatching algorithms could only be done if they run over the same external network load conditions. In order to do so, a tool to arbitrarily replicate historical observations of load parameters while running the different strategies was developed. In this environment, the new algorithms are being tested and compared to verify the improvement over the dispatching strategy already available. The overall performance of the system was tested with in-house developed numerical models. The project reported here is connected with other efforts at CeCal devoted to make it easier for scientists and developers to gain advantage of parallel computing techniques using low cost components. 1 This research was supported in part by CSIC 2 IEEE member 3 Centro de Cálculo. Julio Herrera y Reissig 565 5to piso. CP 11.300. Montevideo. Uruguay http://www.fing.edu.uy/cecal Introduction Parallel architectures are nowadays wide spread. With the growth of node power and the increases of network bandwidth, present networks can be used as powerful parallel environments. This kind of hardware support for distributed processing can be found in a growing number of companies and universities. However, software support for efficiently use this kind of architectures is still not good enough. There is almost no visual environment for developing distributed systems, and the ones that exist are extremely expensive. On the other hand, they are (in many senses) under-developed. In addition, they only have rudimentary tools to cope with unexpected overload in a node. This usually leads to a significant delay in the computation time. CeCal [1] is concerned about this topic, and has many concluded ([2][3][4][5]) and ongoing ([6][7][8]) research projects. This project proposes to improve PVM [9] dispatching algorithms in order to use historical information about the use of the computers that compose the virtual machine. New algorithms based in neural networks techniques and traditional statistical methods (ARMA, ARIMA) were used to predict future usage of individual workstations based on historical usage. In order to compare two forecasting algorithms under the same workload environment, two instances of the same program with just the dispatching criteria changed should be run. However, this executions should run in a background load environment as similar as possible. This paper describes the work done on this direction, both about the development of new dispatching algorithms and in the infrastructure needed to fairly compare them against the standard ones (network load gatherer and replicator). These new dispatching algorithms were tested using UCLA ́s Global Climate Model [10] and a Shallow Water Model. [11] Load Balancing and new dispatching criteria It has been proved that load balancing can be considered as the most relevant performance enhancement in a multi-user network environment. When many tasks are used for solving some problem, a single delay produces the whole system be delayed, with the other tasks waiting for the delayed one at synchronizing points. Load balancing techniques allow distributed applications to take care about this problem. Standard load balancing strategies for distributed applications propose choosing the most appropriate workstation of the virtual machine. The most appropriate workstation means the lowest loaded machine at task spawning time. All the history of workstations load is ignored and only instant load is considered. Our project proposed and implemented two original methods for improving load balancing of distributed applications. In both cases, load balancing will be considered before tasks are started, by forecasting load in the target computer. Figure 1 By considering statistical information, load balancing could be improved. Our project will include services in the PVM kernel for gathering virtual machine workstations load information, and using it at task spawning time in order to forecast future workstation load and correctly define the "most appropriated workstation" choice. Different time series forecasting methods were compared; predictors were implemented and a framework was developed in order to test and compare improved vs. traditional dispatch. Determination of Load Meaningful Parameters There exists many and different workload that can be considered for load balancing improvement: number of tasks in the run queue, size of free available memory, 1-min load average, amount of free CPU time, and so on. In order to improve PVM dispatching routine we have to predict, in some way, future load of the workstations that compose the virtual machine. This prediction will be based in some combination of the selected workload parameter. This value is usually called the load index. In order to compare our new dispatching strategy, we need to run the same PVM program in the same virtual machine load environment with both the old and the new dispatching routines. So, we must be able to generate a workload artificially, for arbitrarily replicate historical observations of load parameters while applying the different strategies. The framework developed for testing and comparing different task dispatching algorithms (the Load Replicator) will be discussed below. We needed support for replicating, at any time, a given network load situation. Only with this artificially generated background workload, one could fairly compare different solutions for the dispatching of a distributed program. The input is, for N workstations, a time series of the load observed for some representative parameters, and the output will be each workstation loaded with the prescribed load. UNIX systems provide at many levels of detail information about system usage [12]. The load information we used is the one provided by the rstat service, which gathers information about CPU usage, local (non NFS) disk usage, paging and swapping, interrupts and context switches, network usage and collisions. Some of them are graphically represented in figure 1. Looking for simplicity and transparent processing, we discarded some parameters that have no effect in global load patterns. For example, kernel operations like context switching were ignored because they are extremely hard to trace and replicate, and they do not affect global workstation performance [13]. Also, parameters related with misconfiguration problems (for example, if excessive swapping and paging is observed, possible exists a memory leak problem) were ignored. On the other hand, parameters that hardly change or are not meaningful over the time (like collisions) also were neglected. The identification of suitable load indexes is not a new problem [14], and is well known that simple load indexes are particularly effective. Kunz [15] found that the most effective of the indexes we have mentioned above is the CPU queue length. However, due to the specific topic we are managing (network parallel computing) where network traffic is a bottleneck and distributed processes usually make great use of local disk, we considered that this two indexes should not be neglected. Artificial Replication of Network Load Environment Under the assumptions described, we have replicated just CPU, disk and network usage for each workstation. Replication is not a new research area, and it is needed for testing purposes in many computer science areas. Workload replicators for measuring performance of different file system implementations were studied in [16][17]. Also replicators for real-time system applications were developed that work at the kernel level, obtaining excellent results for very short periods of time [14]. As our purpose is to replicate workloads in a complete network with many parameters of each workstation, we don’t need excessive precision in the replication at the microsecond scale, but we expect acceptable results for the long periods (like hours). So we decided to work at the process level. In this paper we will focus only on CPU replication; the other parameter replication are exactly the same, just changing the loader subprocess. The procedure is very simple: if historical load is bigger than actual load, then we do hard work in non-cacheable tasks; if not, we just wait. Periodically we compare both loads and we act in consequence. Figure 3 – Replicated CPU load Figure 2 – Historical CPU load Despite its simplicity, this strategy works fine. We can visually compare the historical load (figure 2) and the replicated one (figure 3). Vertical axis shows the percent of CPU used during the measured interval, with increments in the range 1-100 per second. Horizontal axis indicates historical measurements, taken every 60 seconds in these experiments. These units are used in all experiments shown in this paper, unless is specified. Errors obtained for CPU replication are shown in figure 4. Working with a 60 seconds interval, average deviation was of 0,97 while the maximum deviation was 5. The vertical axis units are those provided by rstat. In order to measure the overhead generated by the replication processes, producer and replicator processes could be run using zero historical data. Also, load overhead generated by the collector process was considered. Figure 5 shows generated load for both auxiliary processes involved in load replication. Load overhead observed was of 0.1 % of CPU used per second, so it could be neglected, and ensures that previous results were fairly good for our purposes. Disk and network replication designs were similar, although results were not so good. Disk simulation was implemented writing random data directly to the raw device (in order to avoid system caches). However, some historical high data was impossible to replicate, especially when was generated due to access to very slow devices (like CD-RROM units). Network usage replication was rather good: the average absolute deviation for each interval was of 25 packets/sec. Additional information about this sub-project (and some other related projects) could be found at [5]. Figure 5: generated load for both auxiliary processes involved in load replication. Figure 4 – Average deviation in CPU load replication, per second Collector Process
منابع مشابه
Fuel Consumption Reduction and Energy Management in Stand-Alone Hybrid Microgrid under Load Uncertainty and Demand Response by Linear Programming
A stand-alone microgrid usually contains a set of distributed generation resources, energy storage system and loads that can be used to supply electricity of remote areas. These areas are small in terms of population and industry. Connection of these areas to the national distribution network due to the high costs of constructing transmission lines is not economical. Optimal utilization and eco...
متن کاملKeynote 1 - Strong Programming Model for Strong Weak Mobility: The ProActive Parallel Suite
ProActive [2] is a Java library (Source code under GPL license) for parallel, distributed, and concurrent computing, also featuring mobility and security in a uniform framework. ProActive aimed at simplifying the programming of applications that are distributed on Local Area Network (LAN), on cluster of workstations, or large scale Grids. ProActive promotes a strong NoC approach, Network On Shi...
متن کاملFeasibility study of presenting a dynamic stochastic model based on mixed integer second-order conic programming to solve optimal distribution network reconfiguration in the presence of resources and demand-side management
Nowadays, with the use of devices such as fossil distributed generation and renewable energy resources and energy storage systems that are operated at the level of distribution networks, the problem of optimal reconfiguration has faced major challenges, so any change in the power of this resources can have different results in reconfiguration. Similarly, load changes during the day can lead to ...
متن کاملA Proactive Management Framework in Active Clusters
An active Web cluster system is an active network that has a collection of locally distributed servers that are interconnected by active switches, providing a Web application service. In this paper, we introduce the ALBM (Adaptive Load Balancing and Management) active cluster system that provides proactive management. The architecture of the ALBM active cluster and its underlying components are...
متن کاملSmart load shedding and distributed generation resources rescheduling to improve distribution system restoration performance
After a permanent fault occurs if it is not possible to supply the load in the network, the optimal load restoration scheme allows the system to restoration the load with the lowest exit cost, the lowest load interruption, and in the shortest possible time. This article introduces a new design called Smart Load Shedding, abbreviated SLS. In the proposed SLS scheme, the types of devices in smart...
متن کاملOptimising Network Control Traffic in Resource Constrained MANETS
The exchange of Network State Beacons (NSBs) is crucial to monitoring the dynamic state of MANETs like sensor networks. The rate of beacon exchange (FX) and the network state define both the time and nature of a proactive action to reconfigure the network in order to combat network performance degradation at a time of crisis. It is thus essential to select the FX within optimized bounds, so tha...
متن کامل