نتایج جستجو برای: action value function

تعداد نتایج: 2342819  

2015
Tom Schaul Daniel Horgan Karol Gregor David Silver

Value functions are a core component of reinforcement learning systems. The main idea is to to construct a single function approximator V (s; θ) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V (s, g; θ) that generalise not just over states s but also over goals g. We develop an efficient techni...

2007
CHUL-WOO LEE

Coarse functions are functions whose graphs appear to be continuous at a distance, but in fact may not be continuous. In this paper we explore examples and properties of coarse functions. We then generalize several basic theorems of continuous functions which apply to coarse functions.

2013
Mitchell Keith Bloch John Edwin Laird

Our goal is to develop broadly competent agents that can dynamically construct an appropriate value function for tasks with large state spaces so that they can effectively and efficiently learn using reinforcement learning. We study the case where an agent’s state is determined by a small number of continuous dimensions, so that the problem of determining the relevant features corresponds rough...

Journal: :CoRR 2015
Hao Yi Ong

We propose a novel value function approximation technique for Markov decision processes. We consider the problem of compactly representing the state-action value function using a low-rank and sparse matrix model. The problem is to decompose a matrix that encodes the true value function into low-rank and sparse components, and we achieve this using Robust Principal Component Analysis (PCA). Unde...

Journal: :Philosophical Transactions of the Royal Society B: Biological Sciences 2014

Journal: :I. J. Robotics Res. 2015
Jen Jen Chung Nicholas R. J. Lawrance Salah Sukkarieh

This paper examines temporal difference reinforcement learning with adaptive and directed exploration for resourcelimited missions. The scenario considered is that of an unpowered aerial glider learning to perform energy-gaining flight trajectories in a thermal updraft. The presented algorithm, eGP-SARSA(l), uses a Gaussian process regression model to estimate the value function in a reinforcem...

2007
Jelle R. Kok Nikos Vlassis

One of the main problems in cooperative multiagent learning is that the joint action space is exponential in the number of agents. In this paper, we investigate a sparse representation of the joint action space in which value rules specify the coordination dependencies between the different agents for a particular state. Each value rule has an associated payoff which is part of the global Q-fun...

2000
Gregory Z. Grudic Lyle H. Ungar

Function Approximation (FA) representations of the state-action value function Q have been proposed in order to reduce variance in performance gradients estimates, and thereby improve performance of Policy Gradient (PG) reinforcement learning in large continuous domains (e.g., the PIFA algorithm of Sutton et al. (in press)). We show empirically that although PIFA converges significantly faster ...

2007
Gregory Z. Grudic Lyle H. Ungar

Function Approximation (FA) representations of the state-action value function Q have been proposed in order to reduce variance in performance gradients estimates, and thereby improve performance of Policy Gradient (PG) reinforcement learning in large continuous domains (e.g., the PIFA algorithm of Sutton et al. (in press)). We show empirically that although PIFA converges signiicantly faster t...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید