نتایج جستجو برای: action value function

تعداد نتایج: 2342819  

2016
Shixiang Gu Timothy Lillicrap Ilya Sutskever Sergey Levine

The iLQG algorithm optimizes trajectories by iteratively constructing locally optimal linear feedback controllers under a local linearization of the dynamics p(xt+1|xt,ut) = N (fxtxt + futut,Ft) and a quadratic expansion of the rewards r(xt,ut) (Tassa et al., 2012). Under linear dynamics and quadratic rewards, the action-value function Q(xt,ut) and value function V (xt) are locally quadratic an...

2012
A. BUDHIRAJA K. ROSS

A singular stochastic control problem with state constraints in twodimensions is studied. We show that the value function is C1 and its direc­ tional derivatives are the value functions of certain optimal stopping prob­ lems. Guided by the optimal stopping problem, we then introduce the associ­ ated no-action region and the free boundary and show that, under appropriate conditions, an optimally...

2005
Scott Sanner Craig Boutilier

We describe a probabilistic planning approach that translates a PPDDL planning problem description to a first-order MDP (FOMDP) and uses approximate solution techniques for FOMDPs to derive a value function and corresponding policy. Our FOMDP solution techniques represent the value function linearly w.r.t. a set of first-order basis functions and compute suitable weights using lifted, first-ord...

Journal: :Management Science 2007
Shie Mannor Duncan Simester Peng Sun John N. Tsitsiklis

W consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and study the bias and variance in the value function estimates that result from empirical estimates of the model parameters. We provide closed-form approximations for the bias and variance, which can then be used to derive confidence intervals around the value function estimates. We illustrate...

Journal: :CoRR 2017
Yemi Okesanjo Victor Kofia

Off-policy stochastic actor-critic methods rely on approximating the stochastic policy gradient in order to derive an optimal policy. One may also derive the optimal policy by approximating the action-value gradient. The use of action-value gradients is desirable as policy improvement occurs along the direction of steepest ascent. This has been studied extensively within the context of natural ...

Journal: :Proceedings of the ... AAAI Conference on Artificial Intelligence 2022

We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation. Such extension enables PeVFA preserve values of multiple policies at same time and brings appealing characteristic, i.e., generalization among policie...

Journal: :CoRR 2018
Motoya Ohnishi Li Wang Gennaro Notomista Magnus Egerstedt

This paper presents a safety-aware learning framework that employs an adaptive model learning method together with barrier certificates for systems with possibly nonstationary agent dynamics. To extract the dynamic structure of the model, we use a sparse optimization technique, and the resulting model will be used in combination with control barrier certificates which constrain feedback control...

Journal: :Economie et Statistique / Economics and Statistics 2019

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید