Q-Decomposition for Reinforcement Learning Agents
نویسندگان
چکیده
The paper explores a very simple agent design method called Q-decomposition, wherein a complex agent is built from simpler subagents. Each subagent has its own reward function and runs its own reinforcement learning process. It supplies to a central arbitrator the Q-values (according to its own reward function) for each possible action. The arbitrator selects an action maximizing the sum of Q-values from all the subagents. This approach has advantages over designs in which subagents recommend actions. It also has the property that if each subagent runs the Sarsa reinforcement learning algorithm to learn its local Q-function, then a globally optimal policy is achieved. (On the other hand, local Q-learning leads to globally suboptimal behavior.) In some cases, this form of agent decomposition allows the local Q-functions to be expressed by muchreduced state and action spaces. These results are illustrated in two domains that require effective coordination of behaviors.
منابع مشابه
Hierarchical Functional Concepts for Knowledge Transfer among Reinforcement Learning Agents
This article introduces the notions of functional space and concept as a way of knowledge representation and abstraction for Reinforcement Learning agents. These definitions are used as a tool of knowledge transfer among agents. The agents are assumed to be heterogeneous; they have different state spaces but share a same dynamic, reward and action space. In other words, the agents are assumed t...
متن کاملThe MAXQ Method for Hierarchical Reinforcement Learning
This paper presents a new approach to hierarchical reinforcement learning based on the MAXQ decomposition of the value function. The MAXQ decomposition has both a procedural semantics—as a subroutine hierarchy—and a declarative semantics—as a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kae...
متن کاملUser-based Vehicle Route Guidance in Urban Networks Based on Intelligent Multi Agents Systems and the ANT-Q Algorithm
Guiding vehicles to their destination under dynamic traffic conditions is an important topic in the field of Intelligent Transportation Systems (ITS). Nowadays, many complex systems can be controlled by using multi agent systems. Adaptation with the current condition is an important feature of the agents. In this research, formulation of dynamic guidance for vehicles has been investigated based...
متن کاملReinforcement learning based feedback control of tumor growth by limiting maximum chemo-drug dose using fuzzy logic
In this paper, a model-free reinforcement learning-based controller is designed to extract a treatment protocol because the design of a model-based controller is complex due to the highly nonlinear dynamics of cancer. The Q-learning algorithm is used to develop an optimal controller for cancer chemotherapy drug dosing. In the Q-learning algorithm, each entry of the Q-table is updated using data...
متن کاملAdaptive agents on evolving networks
In this work we study the learning dynamics for agents playing games on networks. We propose a model of network formation in repeated games where players strategically adopt actions and connections simultaneously using a reinforcement learning scheme which is called Boltzmann-Q-learning. This adaptation scheme in the continuous time limit has a proven relation to the evolutionary game theory th...
متن کامل