A Neuro Dynamic Programming Approach to Call Admission Control in Integrated Service Networks The Single Link Case
نویسندگان
چکیده
We formulate the call admission control problem for a single link in an integrated service environment as a Markov Decision Problem In prin ciple an optimal admission control policy can be computed using methods of Dynamic Programming However as the number of possible states of the un derlying Markov Chain grows exponentially in the number of customer classes Dynamic Programming algorithms for realistic size problems are computation ally infeasible We try to overcome this so called curse of dimensionality by using methods of Neuro Dynamic Programming NDP for short NDP employs simulation based algorithms and function approximation techniques to nd con trol policies for large scale Markov Decision Problems We apply two methods of NDP to the call admission control problem the TD algorithm and Approxim ate Policy Iteration We assess the performance of these methods by comparing with two heuristic policies a policy which always accepts a new customer when the required resources are available and a threshold policy This research was supported by a contract with Siemens AG Munich Germany Introduction Markov Decision Problems have been a popular paradigm for sequential de cision making under uncertainty Dynamic Programming provides a frame work for studying such problems as well as algorithms for computing optimal decision policies Unfortunately these algorithms become computationally in feasible when the underlying state space is large This so called curse of dimen sionality renders the classical methods of Dynamic Programming inapplicable to most realistic problems As a result control policies for practical large scale sequential decision problems often rely on heuristics In recent years a new methodology called Neuro Dynamic Programming NDP for short has emerged NDP tries to overcome the curse of dimen sionality by employing stochastic approximation algorithms and function ap proximation techniques such as neural networks The outcome is a methodology for approximating Dynamic Programming solutions with reduced computational requirements Over the past few years methods of NDP have been successfully applied to challenging problems Examples include a program that plays Backgam mon an elevator dispatcher a job scheduler and a call admission control policy in wireless communication networks Despite these successes most algorithms proposed in the eld are not well understood at a theoretical level Nevertheless the potential of these methods for solving systematically large scale Markov Decision Problems and the successful experimental work in the eld has drawn considerable attention In this paper we apply methods of NDP to the call admission control prob lem in an integrated service environment In particular we consider a single communication link with a given bandwidth that serves several customer classes of di erent values The customer classes are characterized by the following para meters bandwidth demand arrival rate departure rate and a reward we obtain whenever we accept a customer of that class The goal is to nd a call admission control policy which maximizes the long term reward Related work has been done by Nordstr om et al and The paper is structured in the following way in Section we state the call admission control problem A brief review of Dynamic Programming is given in Section Section introduces two methods of NDP the TD algorithm and Approximate Policy Iteration In Section we describe two parametric forms used to approximate the reward value function of DynamicProgramming mul tilayer perceptron and quadratic parameterizations We formulate in Section the call admission control problem as a Markov Decision Problem Section de nes two heuristic control policies for the call admission control problem a policy which always accepts a new customer when the required resources are available and a threshold policy Experimental results of two case studies are presented in Section Call Admission Control We are given a single communication link with a total bandwidth of B units We intend to support a nite set f Ng of customer classes Customers of the di erent classes request connections over the link according to independent Poisson Processes The arrival rate of customers of class n is denoted by n When a new customer requests a connection we can either decide to reject that customer or if enough bandwidth is available to accept it call admission con trol Once accepted a customer of class n seizes b n units of bandwidth for t units of time where t is exponentially distributed with parameter n inde pendently of everything else happening in the system Furthermore whenever we accept a customer of class n we receive a reward c n The goal is to exercise call admission control in such a way that we maximize long term reward In this problem formulation the reward c n could be the price customers of class n are paying for using b n units of bandwidth of the link This models the situation where a telecommunication network provider wants to sell bandwidth to customers in such a way that long term revenue is maximized The reward c n could also be used to attach levels of importance priority to the di erent service classes This re ects the case where one wants to provide the di erent customer classes di erent qualities of service e g customers of a class with a high reward high importance priority level should be less likely to be blocked than customers of a class with with a low reward low import ance priority level Furthermore the bandwidth demand b n could either re ect the demand associated with the peak transmission rate requested by customers of class n or a so called e ective bandwidth associated with customers of class n The concept of an e ective bandwidth has been introduced and extensively studied in the context of ATM see for example ATM Asynchronous Transfer Mode is a technology which implements integrated service telecommunication networks In ATM the notion of an e ective bandwidth is used to encapsule cell level behavior such as multiplexing This separation of cell level and call level is important for the tractability of the call admission control problem Although one is ultimately interested in applying call admission control com bined with routing to the case of an integrated service network we focus here on the single link case This allows us to test algorithms of NDP on a problem for which good heuristic policies are available and for which if the instance is fairly small an optimal control policy can be computed Furthermore results obtained for the single link case can be used as a basis for solving the network case Markov Decision Problems and Dynamic Pro gramming In this section we give a brief review of Markov Decision Problems and Dy namic Programming Markov Decision Problems have been a popular paradigm for sequential decision making problems under uncertainty Dynamic Program ming provides a framework for studying such problems as well as algorithms for computing optimal control policies We consider in nite horizon discrete time discounted Markov Decision Prob lems de ned on a nite state space S and involving a nite set U of control ac tions Although we formulate the call admission control problem as a continuous time Markov Decision Problem it can be translated into a discrete time Markov Decision Problem using uniformization Therefore we can without loss of generality limit our framework to discrete time Markov Decision Problems For every state s S there is a set of nonnegative scalars p s u s such that for all control actions u in U P s S p s u s is equal to The scalar p s u s is interpreted as the probability of a transition from state s to state s under control action u With a state s and a control action u we associate a real valued one stage reward g s u A stationary policy is a function S U Let M be the set of all possible stationary policies A stationary policy de nes a discrete time Markov Chain Sk with the transition probabilities P fSk s jSk sg p s s s With a stationary policy M and a state s S we associate the reward to go function
منابع مشابه
A Neuro - Dynamic Programming Approach
We formulate the call admission control problem for a single link in an integrated service environment as a Markov Decision Problem. In principle , an optimal admission control policy can be computed using methods of Dynamic Programming. However, as the number of possible states of the underlying Markov Chain grows exponentially in the number of customer classes, Dynamic Programming algorithms ...
متن کاملCall Admission Control and Routing in Integrated Service Networks Using Reinforcement Learning
In integrated service communication networks, an important problem is to exercise call admission control and routing so as to optimally use the network resources. This problem is naturally formulated as a dynamic programming problem, which, however, is too complex to be solved exactly. We use methods of reinforcement learning (RL), together with a decomposition approach, to find call admission ...
متن کاملReinforcement Learning for Call Admission Control and Routing in Integrated Service Networks
In integrated service communication networks, an important problem is to exercise call admission control and routing so as to optimally use the network resources. This problem is naturally formulated as a dynamic programming problem, which, however, is too complex to be solved exactly. We use methods of reinforcement learning (RL), together with a decomposition approach, to find call admission ...
متن کاملA neuro-dynamic programming approach to admission control in ATM networks: the single link case
We are interested in solving large-scale Markov Decision Problems. The classical method of Dynamic Programming provides a mathematical framework for nding optimal solutions for a given Markov Decision Problem. However, for Dynamic Programming algorithms become computationally infeasible when the underlying Markov Decision Problem evolves over a large state space. In recent years, a new methodol...
متن کاملA Multi-Objective Fuzzy Approach to Closed-Loop Supply Chain Network Design with Regard to Dynamic Pricing
During the last decade, reverse logistics networks received a considerable attention due to economic importance and environmental regulations and customer awareness. Integration of leading and reverse logistics networks during logistical network design is one of the most important factors in supply chain. In this research, an Integer Linear Programming model is presented to design a multi-layer...
متن کامل