نتایج جستجو برای: action value function
تعداد نتایج: 2342819 فیلتر نتایج به سال:
The iLQG algorithm optimizes trajectories by iteratively constructing locally optimal linear feedback controllers under a local linearization of the dynamics p(xt+1|xt,ut) = N (fxtxt + futut,Ft) and a quadratic expansion of the rewards r(xt,ut) (Tassa et al., 2012). Under linear dynamics and quadratic rewards, the action-value function Q(xt,ut) and value function V (xt) are locally quadratic an...
A singular stochastic control problem with state constraints in twodimensions is studied. We show that the value function is C1 and its direc tional derivatives are the value functions of certain optimal stopping prob lems. Guided by the optimal stopping problem, we then introduce the associ ated no-action region and the free boundary and show that, under appropriate conditions, an optimally...
We describe a probabilistic planning approach that translates a PPDDL planning problem description to a first-order MDP (FOMDP) and uses approximate solution techniques for FOMDPs to derive a value function and corresponding policy. Our FOMDP solution techniques represent the value function linearly w.r.t. a set of first-order basis functions and compute suitable weights using lifted, first-ord...
W consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and study the bias and variance in the value function estimates that result from empirical estimates of the model parameters. We provide closed-form approximations for the bias and variance, which can then be used to derive confidence intervals around the value function estimates. We illustrate...
Off-policy stochastic actor-critic methods rely on approximating the stochastic policy gradient in order to derive an optimal policy. One may also derive the optimal policy by approximating the action-value gradient. The use of action-value gradients is desirable as policy improvement occurs along the direction of steepest ascent. This has been studied extensively within the context of natural ...
We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation. Such extension enables PeVFA preserve values of multiple policies at same time and brings appealing characteristic, i.e., generalization among policie...
This paper presents a safety-aware learning framework that employs an adaptive model learning method together with barrier certificates for systems with possibly nonstationary agent dynamics. To extract the dynamic structure of the model, we use a sparse optimization technique, and the resulting model will be used in combination with control barrier certificates which constrain feedback control...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید