q learning

Introduction of Majority Vote of Neighborhood Conditions for Sneak form Reinforcement Learning

2013

Yuki Tezuka Akira Notsu Katsuhiro Honda

Chain Form Reinforcement Learning (CFRL) was proposed for a reinforcement learning agent using low memory. In this paper, we introduce Sneak Form Reinforcement Learning (SFRL). SFRL is the method where we improve CFRL in terms of Contextual Learning. If a sequence of state-action pairs has a shortest path, a SFRL agent cuts and saves the path. To improve the performance of SFRL, we introduce Ma...

متن کامل

Machine Learning for Autonomous Robotic Agents

1996

Andrea Bonarini Giuseppe Borghi Pierguido V. Caironi Marco Colombetti Marco Dorigo Fabio Marchese

We present some results of our research in the field of Machine Learning applied to robotics problems. In particular we have investigated on: (i) the application of Learning Classifier Systems to the synthesis of robot controllers; (ii) learning of fuzzy controllers; (iii) learning of purposeful representations of the environment; (iv) and the application of versions of Q-learning to robot trai...

متن کامل

Dynamic Joint Action Perception for Q-Learning Agents

2003

Nancy Fulda Dan Ventura

Q-Iearning is a reinforcement learning alg()rithm that learns expected utilities for stateaction transitions through successive interactions with the environment The algorithm '5 simplicity as well as its convergence properties have made it a popular algorithm for study However; its non-parametric representation of utilities limits its effectiveness in environments with large amounts of percept...

متن کامل

Cooperation-eliciting prisoner's dilemma payoffs for reinforcement learning agents

2014

Koichi Moriyama Satoshi Kurihara Masayuki Numao

This work considers a stateless Q-learning agent in iterated Prisoner’s Dilemma (PD). We have already given a condition of PD payoffs and Q-learning parameters that helps stateless Q-learning agents cooperate with each other [2]. That condition, however, has a restrictive premise. This work relaxes the premise and shows a new payoff condition for mutual cooperation. After that, we derive the pa...

متن کامل

Reinforcement Learning by Comparing Immediate Reward

Journal: :CoRR 2010

Punit Pandey Deepshikha Pandey Shishir Kumar

This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate rewards using a variation of Q-Learning algorithm. Unlike the conventional Q-Learning, the proposed algorithm compares current reward with immediate reward of past move and work accordingly. Relative reward based Q-learning is an approach towards interactive learning. Q-Learning is a model free re...

متن کامل

Baselines for Joint-Action Reinforcement Learning of Coordination in Cooperative Multi-agent Systems

2002

Martin Carpenter Daniel Kudenko

We report on an investigation of reinforcement learning techniques for the learning of coordination in cooperative multiagent systems. Specifically, we focus on a novel action selection strategy for Q-learning (Watkins 1989). The new technique is applicable to scenarios where mutual observation of actions is not possible. To date, reinforcement learning approaches for such independent agents di...

متن کامل

Experiments with Reinforcement Learning in Environments with Progressive Difficulty

2003

Michael G. Madden Tom Howley

This paper introduces Progressive Reinforcement Learning, which augments standard Q-Learning with a mechanism for transferring experience gained in one problem to new but related problems. In this approach, an agent acquires experience of operating in a simple domain through experimentation. It then engages in a period of introspection, during which it rationalises the experience gained and for...

متن کامل

Reinforcement Learning in Finite MDPs: PAC Analysis

Journal: :Journal of Machine Learning Research 2009

Alexander L. Strehl Lihong Li Michael L. Littman

We study the problem of learning near-optimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These “PAC-MDP” algorithms include the wellknown E3 and R-MAX algorithms as well as the more recent Delayed Q-learning algorithm. We summarize the current state-of-the-art by presenting bounds for the problem in a unified theoretical framework. A more refined an...

متن کامل

Feudal Reinforcement Learning

1992

Peter Dayan Geoffrey E. Hinton

One way to speed up reinforcement learning is to enable learning to happen simultaneously at multiple resolutions in space and time. This paper shows how to create a Q-learning managerial hierarchy in which high level managers learn how to set tasks to their sub-managers who, in turn, learn how to satisfy them. Sub-managers need not initially understand their managers’ commands. They simply lea...

متن کامل

Concurrent Individual And Social Learning In Robot Teams

Journal: :Computational Intelligence 2016

Larry Ng Mohammad Reza Emami

Despite the advancement of research and development on multi-robot teams, a key challenge still remains as to how to develop effective mechanisms that enable the robots to autonomously generate, adapt, and enhance team behaviours while improving their individual performance simultaneously. After a literature review of various multi-agent learning approaches, the two most promising learning para...

متن کامل