Reinforcement Learning and Distributed Local Model Synthesis

نویسنده

  • Tomas Landelius
چکیده

Reinforcement learning is a general and powerful way to formulate complex learning problems and acquire good system behaviour. The goal of a reinforcement learning system is to maximize a long term sum of instantaneous rewards provided by a teacher. In its extremum form, reinforcement learning only requires that the teacher can provide a measure of success. This formulation does not require a training set with correct responses, and allows the system to become better than its teacher. In reinforcement learning much of the burden is moved from the teacher to the training algorithm. The exact and general algorithms that exist for these problems are based on dynamic programming (DP), and have a computational complexity that grows exponentially with the dimensionality of the state space. These algorithms can only be applied to real world problems if an e cient encoding of the state space can be found. To cope with these problems, heuristic algorithms and function approximation need to be incorporated. In this thesis it is argued that local models have the potential to help solving problems in high-dimensional spaces and that global models have not. This is motivated with the biasvariance dilemma, which is resolved with the assumption that the system is constrained to live on a low-dimensional manifold in the space of inputs and outputs. This observation leads to the introduction of bias in terms of continuity and locality. A linear approximation of the system dynamics and a quadratic function describing the long term reward are suggested to constitute a suitable local model. For problems involving one such model, i.e. linear quadratic regulation problems, novel convergence proofs for heuristic DP algorithms are presented. This is one of few available convergence proofs for reinforcement learning in continuous state spaces. Reinforcement learning is closely related to optimal control, where local models are commonly used. Relations to present methods are investigated, e.g. adaptive control, gain scheduling, fuzzy control, and jump linear systems. Ideas from these areas are compiled in a synergistic way to produce a new algorithm for heuristic dynamic programming where function parameters and locality, expressed as model applicability, are learned on-line. Both top-down and bottom-up versions are presented. The emerging local models and their applicability need to be memorized by the learning system. The binary tree is put forward as a suitable data structure for on-line storage and retrieval of these functions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multicast Routing in Wireless Sensor Networks: A Distributed Reinforcement Learning Approach

Wireless Sensor Networks (WSNs) are consist of independent distributed sensors with storing, processing, sensing and communication capabilities to monitor physical or environmental conditions. There are number of challenges in WSNs because of limitation of battery power, communications, computation and storage space. In the recent years, computational intelligence approaches such as evolutionar...

متن کامل

Dynamic Obstacle Avoidance by Distributed Algorithm based on Reinforcement Learning (RESEARCH NOTE)

In this paper we focus on the application of reinforcement learning to obstacle avoidance in dynamic Environments in wireless sensor networks. A distributed algorithm based on reinforcement learning is developed for sensor networks to guide mobile robot through the dynamic obstacles. The sensor network models the danger of the area under coverage as obstacles, and has the property of adoption o...

متن کامل

Operation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm

: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...

متن کامل

Behavior-Based Reinforcement Learning

This paper introduces an integration of reinforcement learning and behavior-based control designed to produce real-time learning in situated agents. The model layers a distributed and asynchronous reinforcement learning algorithm over a learned topological map and standard behavioral substrate to create a reinforcement learning complex. The topological map creates a small and task-relevant stat...

متن کامل

Designing Decentralized Controllers for Distributed-Air-Jet MEMS-Based Micromanipulators by Reinforcement Learning

Distributed-air-jet MEMS-based systems have been proposed to manipulate small parts with high velocities and without any friction problems. The control of such distributed systems is very challenging and usual approaches for contact arrayed system don’t produce satisfactory results. In this paper, we investigate reinforcement learning control approaches in order to position and convey an object...

متن کامل

An Architecture for Behavior-Based Reinforcement Learning

This paper introduces an integration of reinforcement learning and behavior-based control designed to produce real-time learning in situated agents. The model layers a distributed and asynchronous reinforcement learning algorithm over a learned topological map and standard behavioral substrate to create a reinforcement learning complex. The topological map creates a small and task-relevant stat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997