Two-Timescale Q-Learning with an Application to Routing in Communication Networks

نویسندگان

  • Mohan Babu
  • Shalabh Bhatnagar
چکیده

We propose two variants of the Q-learning algorithm that (both) use two timescales. One of these updates Q-values of all feasible state-action pairs at each instant while the other updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A sketch of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms for routing on different network topologies are presented and performance comparisons with the regular Q-learning algorithm are shown. Index Terms Q-learning based algorithms, Markov decision processes, two-timescale stochastic approximation, simultaneous perturbation stochastic approximation (SPSA), normalized Hadamard matrices.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Adaptive LEACH-based Clustering Algorithm for Wireless Sensor Networks

LEACH is the most popular clastering algorithm in Wireless Sensor Networks (WSNs). However, it has two main drawbacks, including random selection of cluster heads, and direct communication of cluster heads with the sink. This paper aims to introduce a new centralized cluster-based routing protocol named LEACH-AEC (LEACH with Adaptive Energy Consumption), which guarantees to generate balanced cl...

متن کامل

Multicast Routing in Wireless Sensor Networks: A Distributed Reinforcement Learning Approach

Wireless Sensor Networks (WSNs) are consist of independent distributed sensors with storing, processing, sensing and communication capabilities to monitor physical or environmental conditions. There are number of challenges in WSNs because of limitation of battery power, communications, computation and storage space. In the recent years, computational intelligence approaches such as evolutionar...

متن کامل

New algorithms of the Q-learning type

We propose two algorithms for Q-learning that use the two timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state-action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments usin...

متن کامل

Improvement of Routing Operation Based on Learning with Using Smart Local and Global Agents and with the Help of the Ant Colony Algorithm

Routing in computer networks has played a special role in recent years. The cause of this is the role of routing in a performance of the networks. The quality of service and security is one of the most important challenges in routing due to lack of reliable methods. Routers use routing algorithms to find the best route to a particular destination. When talking about the best path, we consider p...

متن کامل

Improvement of Routing Operation Based on Learning with Using Smart Local and Global Agents and with the Help of the Ant Colony Algorithm

Routing in computer networks has played a special role in recent years. The cause of this is the role of routing in a performance of the networks. The quality of service and security is one of the most important challenges in routing due to lack of reliable methods. Routers use routing algorithms to find the best route to a particular destination. When talking about the best path, we consider p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006