نتایج جستجو برای: action value function
تعداد نتایج: 2342819 فیلتر نتایج به سال:
This paper examines temporal difference reinforcement learning (RL) with adaptive and directed exploration for resource-limited missions. The scenario considered is for an energy-limited agent which must explore an unknown region to find new energy sources. The presented algorithm uses a Gaussian Process (GP) regression model to estimate the value function in an RL framework. However, to avoid ...
investigation of desertification trend needs understanding of phenomena creating changes singly or action and reaction together in the manner that these changes were ended up in land degradation. in investigation of pedological criterion onland degradation in quaternary rock units, first, a part of the rude-shoor watershed area was selected. after distinguishing target area, maps of slope class...
A singular stochastic control problem with state constraints in two-dimensions is studied. We show that the value function is C and its directional derivatives are the value functions of certain optimal stopping problems. Guided by the optimal stopping problem, we then introduce the associated no-action region and the free boundary and show that, under appropriate conditions, an optimally contr...
We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning where the action representation adds to thecurse-of-dimensionality; that is, with continuous or large action sets, thus making it infeasible to estimate state-...
We describe how an agent can dynamically and incrementally determine the structure of a value function from background knowledge as a side effect of problem solving. The agent determines the value function as it performs the task, using background knowledge in novel situations to compute an expected value for decision making. That expected value becomes the initial estimate of the value functio...
] , [ b a , ] , [ d c real intervals w ∇ partial derivatives of a quantity according to the weight vector elements i A , i B , i C machining times on machine s A, B, and C respectively j i a jth action in the ith state of the agent α learning rate for " mean value type " action-state values b vector of boundary conditions in flow-shop scheduling β learning rate for " standard deviation type " a...
A Boundedness Theoretical Analysis for GrADPDesign: A Case Study on Maze Navigation Report Title A new theoretical analysis towards the goal representation adaptive dynamic programming (GrADP) design proposed in [1], [2] is investigated in this paper. Unlike the proofs of convergence for adaptive dynamic programming (ADP) in literature, here we provide a new insight for the error bound between ...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید