action value function

نتایج جستجو برای: action value function

تعداد نتایج: 2342819 فیلتر نتایج به سال:

Resource Constrained Exploration in Reinforcement Learning

2013

Jen Jen Chung Nicholas R. J. Lawrance Salah Sukkarieh

This paper examines temporal difference reinforcement learning (RL) with adaptive and directed exploration for resource-limited missions. The scenario considered is for an energy-limited agent which must explore an unknown region to find new energy sources. The presented algorithm uses a Gaussian Process (GP) regression model to estimate the value function in an RL framework. However, to avoid ...

متن کامل

investigation of pedological criterion on rangeland desertification (case study: south of rude-shoor watershed)

Journal: :desert 0

m. karimpour reihan associate professor, international desert research center, university of tehran, tehran, iran s. feiznia professor, faculty of natural resources, university of tehran, karaj, iran a. salehpour jam ph.d. student, faculty of natural resources, university of tehran, karaj, iran m.k. kianian academic staff, university of semnan, semnan, iran

investigation of desertification trend needs understanding of phenomena creating changes singly or action and reaction together in the manner that these changes were ended up in land degradation. in investigation of pedological criterion onland degradation in quaternary rock units, first, a part of the rude-shoor watershed area was selected. after distinguishing target area, maps of slope class...

متن کامل

Optimal Stopping and Free Boundary Characterizations for Some Brownian

2009

Amarjit Budhiraja

A singular stochastic control problem with state constraints in two-dimensions is studied. We show that the value function is C and its directional derivatives are the value functions of certain optimal stopping problems. Guided by the optimal stopping problem, we then introduce the associated no-action region and the free boundary and show that, under appropriate conditions, an optimally contr...

متن کامل

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

Journal: :CoRR 2018

Hamid Reza Maei

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning where the action representation adds to thecurse-of-dimensionality; that is, with continuous or large action sets, thus making it infeasible to estimate state-...

متن کامل

Art, Value, and Function

Journal: :The Nordic Journal of Aesthetics 2014

متن کامل

Reliability of internal prediction/estimation and its application. I. Adaptive action selection reflecting reliability of value function

Journal: :Neural Networks 2004

متن کامل

Online Determination of Value-Function Structure and Action-value Estimates for Reinforcement Learning in a Cognitive Architecture

2012

John E. Laird Nate Derbinsky

We describe how an agent can dynamically and incrementally determine the structure of a value function from background knowledge as a side effect of problem solving. The agent determines the value function as it performs the task, using background knowledge in novel situations to compute an expected value for decision making. That expected value becomes the initial estimate of the value functio...

متن کامل

General Value Function Networks

Journal: :Journal of Artificial Intelligence Research 2021

متن کامل

Combined Use of Reinforcement Learning and Simulated Annealing: Algorithms and Applications

2003

PÉTER STEFÁN

] , [ b a , ] , [ d c real intervals w ∇ partial derivatives of a quantity according to the weight vector elements i A , i B , i C machining times on machine s A, B, and C respectively j i a jth action in the ith state of the agent α learning rate for " mean value type " action-state values b vector of boundary conditions in flow-shop scheduling β learning rate for " standard deviation type " a...

متن کامل

A Boundedness Theoretical Analysis for GrADPDesign : A Case Study on Maze Navigation Report

2015

Haibo He Zhen Ni Xiangnan Zhong

A Boundedness Theoretical Analysis for GrADPDesign: A Case Study on Maze Navigation Report Title A new theoretical analysis towards the goal representation adaptive dynamic programming (GrADP) design proposed in [1], [2] is investigated in this paper. Unlike the proofs of convergence for adaptive dynamic programming (ADP) in literature, here we provide a new insight for the error bound between ...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید