Q-Batch: initial results with a novel update rule for Batch Reinforcement Learning
نویسندگان
چکیده
Batch Reinforcement Learning has established itself as a valuable alternative to develop learning and adaptive agents. Batch Reinforcement Learning algorithms are characterized by obtaining a policy from a set of collected data. Common methods apply adapted versions of RL update rules, such as QLearning, on the transitions of the batch, building a pattern set. The target values of the pattern represent a value function, which is latter “fitted” with a function approximator using batch supervised learning methods. This paper presents the first results with a novel update rule, Q-Batch. The proposed method is benchmarked against the batch version of Q-Learning and Watkins Q(λ) in the Neural Fitted Q Iteration framework. The proposed work is tested in the Predator-Prey simulated environment. Empirical results show that the proposed method is able to achieve comparable or better asymptotical performance while requiring fewer interactions with the environment.
منابع مشابه
Shallow Updates for Deep Reinforcement Learning
Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods with linear representations, on the...
متن کاملReinforcement Learning with Raw Image Pixels as Input State
We report in this paper some positive simulation results obtained when image pixels are directly used as input state of a reinforcement learning algorithm. The reinforcement learning algorithm chosen to carry out the simulation is a batch-mode algorithm known as fitted Q iteration.
متن کاملTree-Based Batch Mode Reinforcement Learning
Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the so-called Q-function based on a set of four-tuples (xt ,ut ,rt ,xt+1) where xt denotes the system state at time t, ut the control action taken, rt the instantaneous reward obtained and xt+1 the succe...
متن کاملResidential Demand Response Applications Using Batch Reinforcement Learning
—Driven by recent advances in batch Reinforcement Learning (RL), this paper contributes to the application of batch RL to demand response. In contrast to conventional model-based approaches, batch RL techniques do not require a system identification step, which makes them more suitable for a large-scale implementation. This paper extends fitted Q-iteration, a standard batch RL technique, to the...
متن کاملOptimal Sample Selection for Batch-mode Reinforcement Learning
We introduce the Optimal Sample Selection (OSS) meta-algorithm for solving discrete-time Optimal Control problems. This meta-algorithm maps the problem of finding a near-optimal closed-loop policy to the identification of a small set of one-step system transitions, leading to high-quality policies when used as input of a batch-mode Reinforcement Learning (RL) algorithm. We detail a particular i...
متن کامل