نتایج جستجو برای: action value function

تعداد نتایج: 2342819  

2013
Jen Jen Chung Nicholas R. J. Lawrance Salah Sukkarieh

This paper examines temporal difference reinforcement learning (RL) with adaptive and directed exploration for resource-limited missions. The scenario considered is for an energy-limited agent which must explore an unknown region to find new energy sources. The presented algorithm uses a Gaussian Process (GP) regression model to estimate the value function in an RL framework. However, to avoid ...

Journal: :desert 0
m. karimpour reihan associate professor, international desert research center, university of tehran, tehran, iran s. feiznia professor, faculty of natural resources, university of tehran, karaj, iran a. salehpour jam ph.d. student, faculty of natural resources, university of tehran, karaj, iran m.k. kianian academic staff, university of semnan, semnan, iran

investigation of desertification trend needs understanding of phenomena creating changes singly or action and reaction together in the manner that these changes were ended up in land degradation. in investigation of pedological criterion onland degradation in quaternary rock units, first, a part of the rude-shoor watershed area was selected. after distinguishing target area, maps of slope class...

2009
Amarjit Budhiraja

A singular stochastic control problem with state constraints in two-dimensions is studied. We show that the value function is C and its directional derivatives are the value functions of certain optimal stopping problems. Guided by the optimal stopping problem, we then introduce the associated no-action region and the free boundary and show that, under appropriate conditions, an optimally contr...

Journal: :CoRR 2018
Hamid Reza Maei

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning where the action representation adds to thecurse-of-dimensionality; that is, with continuous or large action sets, thus making it infeasible to estimate state-...

Journal: :The Nordic Journal of Aesthetics 2014

2012
John E. Laird Nate Derbinsky

We describe how an agent can dynamically and incrementally determine the structure of a value function from background knowledge as a side effect of problem solving. The agent determines the value function as it performs the task, using background knowledge in novel situations to compute an expected value for decision making. That expected value becomes the initial estimate of the value functio...

Journal: :Journal of Artificial Intelligence Research 2021

2003
PÉTER STEFÁN

] , [ b a , ] , [ d c real intervals w ∇ partial derivatives of a quantity according to the weight vector elements i A , i B , i C machining times on machine s A, B, and C respectively j i a jth action in the ith state of the agent α learning rate for " mean value type " action-state values b vector of boundary conditions in flow-shop scheduling β learning rate for " standard deviation type " a...

2015
Haibo He Zhen Ni Xiangnan Zhong

A Boundedness Theoretical Analysis for GrADPDesign: A Case Study on Maze Navigation Report Title A new theoretical analysis towards the goal representation adaptive dynamic programming (GrADP) design proposed in [1], [2] is investigated in this paper. Unlike the proofs of convergence for adaptive dynamic programming (ADP) in literature, here we provide a new insight for the error bound between ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید