Two-Timescale Q-Learning with an Application to Routing in Communication Networks
نویسندگان
چکیده
We propose two variants of the Q-learning algorithm that (both) use two timescales. One of these updates Q-values of all feasible state-action pairs at each instant while the other updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A sketch of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms for routing on different network topologies are presented and performance comparisons with the regular Q-learning algorithm are shown. Index Terms Q-learning based algorithms, Markov decision processes, two-timescale stochastic approximation, simultaneous perturbation stochastic approximation (SPSA), normalized Hadamard matrices.
منابع مشابه
An Adaptive LEACH-based Clustering Algorithm for Wireless Sensor Networks
LEACH is the most popular clastering algorithm in Wireless Sensor Networks (WSNs). However, it has two main drawbacks, including random selection of cluster heads, and direct communication of cluster heads with the sink. This paper aims to introduce a new centralized cluster-based routing protocol named LEACH-AEC (LEACH with Adaptive Energy Consumption), which guarantees to generate balanced cl...
متن کاملMulticast Routing in Wireless Sensor Networks: A Distributed Reinforcement Learning Approach
Wireless Sensor Networks (WSNs) are consist of independent distributed sensors with storing, processing, sensing and communication capabilities to monitor physical or environmental conditions. There are number of challenges in WSNs because of limitation of battery power, communications, computation and storage space. In the recent years, computational intelligence approaches such as evolutionar...
متن کاملNew algorithms of the Q-learning type
We propose two algorithms for Q-learning that use the two timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state-action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments usin...
متن کاملImprovement of Routing Operation Based on Learning with Using Smart Local and Global Agents and with the Help of the Ant Colony Algorithm
Routing in computer networks has played a special role in recent years. The cause of this is the role of routing in a performance of the networks. The quality of service and security is one of the most important challenges in routing due to lack of reliable methods. Routers use routing algorithms to find the best route to a particular destination. When talking about the best path, we consider p...
متن کاملImprovement of Routing Operation Based on Learning with Using Smart Local and Global Agents and with the Help of the Ant Colony Algorithm
Routing in computer networks has played a special role in recent years. The cause of this is the role of routing in a performance of the networks. The quality of service and security is one of the most important challenges in routing due to lack of reliable methods. Routers use routing algorithms to find the best route to a particular destination. When talking about the best path, we consider p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006