Value Functions for RL-Based Behavior Transfer: A Comparative Study
نویسندگان
چکیده
Temporal difference (TD) learning methods (Sutton & Barto 1998) have become popular reinforcement learning techniques in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but have often been found slow in practice. This paper presents methods for further generalizing across tasks, thereby speeding up learning, via a novel form of behavior transfer. We compare learning on a complex task with three function approximators, a CMAC, a neural network, and an RBF, and demonstrate that behavior transfer works well with all three. Using behavior transfer, agents are able to learn one task and then markedly reduce the time it takes to learn a more complex task. Our algorithms are fully implemented and tested in the RoboCup-soccer keepaway domain.
منابع مشابه
INTERSECTION OF ESSENTIAL IDEALS IN THE RING OF REAL-VALUED CONTINUOUS FUNCTIONS ON A FRAME
A frame $L$ is called {it coz-dense} if $Sigma_{coz(alpha)}=emptyset$ implies $alpha=mathbf 0$. Let $mathcal RL$ be the ring of real-valued continuous functions on a coz-dense and completely regular frame $L$. We present a description of the socle of the ring $mathcal RL$ based on minimal ideals of $mathcal RL$ and zero sets in pointfree topology. We show that socle of $mathcal RL$ is an essent...
متن کاملEmergent collective behaviors in a multi-agent reinforcement learning based pedestrian simulation
In this work, a Multi-agent Reinforcement Learning framework is used to get plausible simulations of pedestrians groups. In our framework, each virtual agent learns individually and independently to control its velocity inside a virtual environment. The case of study consists on the simulation of the crossing of two groups of embodied virtual agents inside a narrow corridor. This scenario permi...
متن کاملComparative study of the impact of heat transfer coefficient on the comfort conditions in the interior spaces of Iranian traditional houses (warm and dry climate of Yazd and cold and mountainous of Tabriz)
Paper undertakes software based numerical simulations in two different traditional houses in cold-dry climate of Tabriz and hot-arid climate of Yazd, The Eco Test software chosen for BIM simulations. Case study on real houses instead of hypothetical models can give more tangible results and choosing traditional houses can provide a good opportunity to compare both physical and thermal condition...
متن کاملShaping Proto-Value Functions via Rewards
Learning value function is an important sub-problem in solving a given reinforcement learning task. The choice of representation for the value function directly affects learning. The most widely used representation for the value function is the linear architecture, wherein, the value function is written as a linear combination of a ‘pre-selected’ set of basis functions. In such a scenario, choo...
متن کاملAn Online Q-learning Based Multi-Agent LFC for a Multi-Area Multi-Source Power System Including Distributed Energy Resources
This paper presents an online two-stage Q-learning based multi-agent (MA) controller for load frequency control (LFC) in an interconnected multi-area multi-source power system integrated with distributed energy resources (DERs). The proposed control strategy consists of two stages. The first stage is employed a PID controller which its parameters are designed using sine cosine optimization (SCO...
متن کامل