action value function

Universal Value Function Approximators

2015

Tom Schaul Daniel Horgan Karol Gregor David Silver

Value functions are a core component of reinforcement learning systems. The main idea is to to construct a single function approximator V (s; θ) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V (s, g; θ) that generalise not just over states s but also over goals g. We develop an efficient techni...

متن کامل

Coarse Function Value Theorems

2007

CHUL-WOO LEE

Coarse functions are functions whose graphs appear to be continuous at a distance, but in fact may not be continuous. In this paper we explore examples and properties of coarse functions. We then generalize several basic theorems of continuous functions which apply to coarse functions.

متن کامل

Online Value Function Improvement

2013

Mitchell Keith Bloch John Edwin Laird

Our goal is to develop broadly competent agents that can dynamically construct an appropriate value function for tasks with large state spaces so that they can effectively and efficiently learn using reinforcement learning. We study the case where an agent’s state is determined by a small number of continuous dimensions, so that the problem of determining the relevant features corresponds rough...

متن کامل

Value function approximation via low-rank models

Journal: :CoRR 2015

Hao Yi Ong

We propose a novel value function approximation technique for Markov decision processes. We consider the problem of compactly representing the state-action value function using a low-rank and sparse matrix model. The problem is to decompose a matrix that encodes the true value function into low-rank and sparse components, and we achieve this using Robust Principal Component Analysis (PCA). Unde...

متن کامل

Neural activity in nigrostriatal circuits can signal action value and action sequence

Journal: :Frontiers in Systems Neuroscience 2009

متن کامل

Habits as action sequences: hierarchical action control and changes in outcome value

Journal: :Philosophical Transactions of the Royal Society B: Biological Sciences 2014

متن کامل

Learning to soar: Resource-constrained exploration in reinforcement learning

Journal: :I. J. Robotics Res. 2015

Jen Jen Chung Nicholas R. J. Lawrance Salah Sukkarieh

This paper examines temporal difference reinforcement learning with adaptive and directed exploration for resourcelimited missions. The scenario considered is that of an unpowered aerial glider learning to perform energy-gaining flight trajectories in a thermal updraft. The presented algorithm, eGP-SARSA(l), uses a Gaussian process regression model to estimate the value function in a reinforcem...

متن کامل

Multiagent Q-Learning by Context-Specific Coordination Graphs

2007

Jelle R. Kok Nikos Vlassis

One of the main problems in cooperative multiagent learning is that the joint action space is exponential in the number of agents. In this paper, we investigate a sparse representation of the joint action space in which value rules specify the coordination dependencies between the different agents for a particular state. Each value rule has an associated payoff which is part of the global Q-fun...

متن کامل

Localizing Policy Gradient Estimates to Action Transitions

2000

Gregory Z. Grudic Lyle H. Ungar

Function Approximation (FA) representations of the state-action value function Q have been proposed in order to reduce variance in performance gradients estimates, and thereby improve performance of Policy Gradient (PG) reinforcement learning in large continuous domains (e.g., the PIFA algorithm of Sutton et al. (in press)). We show empirically that although PIFA converges significantly faster ...

متن کامل

Localizing Policy Gradient Estimates to Action

2007

Gregory Z. Grudic Lyle H. Ungar

Function Approximation (FA) representations of the state-action value function Q have been proposed in order to reduce variance in performance gradients estimates, and thereby improve performance of Policy Gradient (PG) reinforcement learning in large continuous domains (e.g., the PIFA algorithm of Sutton et al. (in press)). We show empirically that although PIFA converges signiicantly faster t...

متن کامل