Credit Assignment Method for Learning E ective Stochastic Policies in Uncertain Domains

نویسندگان

  • Sachiyo Arai
  • Katia Sycara
چکیده

In this paper, we introduce FirstVisit Pro tSharing (FVPS) as a credit assignment procedure, an important issue in classi er systems and reinforcement learning frameworks. FVPS reinforces e ective rules to make an agent acquire stochastic policies that cause it to behave very robustly within uncertain domains, without pre-de ned knowledge or subgoals. We use an internal episodic memory, not only to identify perceptual aliasing states but also to discard looping behavior and to acquire e ective stochastic policies to escape perceptual deceptive states. We demonstrate the e ectiveness of our method in some typical classes of Partially Observable Markov Decision Processes, comparing with Sarsa( ) using a replacing eligibility trace. We claim that this approach results in an e ective stochastic or deterministic policy which is appropriate for the environment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Analysis of Direct Reinforcement Learning in Non-Markovian Domains

It is well known that for Markov decision processes, the policies stable under policy iteration and the standard reinforcement learning methods are exactly the optimal policies. In this paper, we investigate the conditions for policy stability in the more general situation when the Markov property cannot be assumed. We show that for a general class of non-Markov decision processes, if actual re...

متن کامل

Uncertain Resource Availabilities: Proactive and Reactive Procedures for Preemptive Resource Constrained project Scheduling Problem

Project scheduling is the part of project management that deals with determining when intime to start (and finish) which activities and with the allocation of scarce resources to theproject activities. In practice, virtually all project managers are confronted with resourcescarceness. In such cases, the Resource-Constrained Project Scheduling Problem (RCPSP)arises. This optimization problem has...

متن کامل

Analysing the Effects of Reward Shaping in Multi-Objective Stochastic Games

The majority of Multi-Agent Reinforcement Learning (MARL) implementations aim to optimise systems with respect to a single objective, despite the fact that many real world problems are inherently multi-objective in nature. Research into multi-objective MARL is still in its infancy, and few studies to date have dealt with the issue of credit assignment. Reward shaping has been proposed as a mean...

متن کامل

E ective Learning Approach for Planning and Scheduling in Multi-Agent Domain

The point we want to make in this paper is that Pro t-sharing; a reinforcement learning approach is very appropriate to realize the adaptive behaviors in a multi-agent environment. We discuss the e ectiveness of Pro t-sharing theoretically and empirically within a Pursuit Game where there exist multiple preys and multiple hunters. In our context of this problem, hunters need to coordinate adapt...

متن کامل

Filtered Reinforcement Learning

Reinforcement learning (RL) algorithms attempt to assign the credit for rewards to the actions that contributed to the reward. Thus far, credit assignment has been done in one of two ways: uniformly, or using a discounting model that assigns exponentially more credit to recent actions. This paper demonstrates an alternative approach to temporal credit assignment, taking advantage of exact or ap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001