action value function

نتایج جستجو برای: action value function

تعداد نتایج: 2342819 فیلتر نتایج به سال:

Generalized Value Functions for Large Action Sets

2011

Jason Pazis Ronald Parr

The majority of value function approximation based reinforcement learning algorithms available today, focus on approximating the state (V) or state-action (Q) value function and efficient action selection comes as an afterthought. On the other hand, real-world problems tend to have large action spaces, where evaluating every possible action becomes impractical. This mismatch presents a major ob...

متن کامل

Heuristic Dynamic Programming Nonlinear Optimal Controller

2012

Asma Al-tamimi Murad Abu-Khalaf Frank Lewis

This chapter is concerned with the application of approximate dynamic programming techniques (ADP) to solve for the value function, and hence the optimal control policy, in discrete-time nonlinear optimal control problems having continuous state and action spaces. ADP is a reinforcement learning approach (Sutton & Barto, 1998) based on adaptive critics (Barto et al., 1983), (Widrow et al., 1973...

متن کامل

Q-functionals for Value-Based Continuous Control

Journal: :Proceedings of the ... AAAI Conference on Artificial Intelligence 2023

We present Q-functionals, an alternative architecture for continuous control deep reinforcement learning. Instead of returning a single value state-action pair, our network transforms state into function that can be rapidly evaluated in parallel many actions, allowing us to efficiently choose high-value actions through sampling. This contrasts with the typical off-policy control, where policy i...

متن کامل

16-899C ACRL Tetris Reinforcement Learner

2009

Alex Grubb Stephane Ross Felix Duvallet

Our approach to this problem was to use reinforcement learning with a function approximator to approximate the state value function [RSS98]. In our case, a +1 reward was given for every completed line, so that the value function would encode the long-term number of lines that is going to be completed by the algorithm. In order to achieve this, we extract features from the game state, and use gr...

متن کامل

Striatal action-value neurons reconsidered

Journal: :eLife 2018

متن کامل

On the Sample Complexity of Reinforcement Learning with a Generative Model

2012

Mohammad Gheshlaghi Azar Rémi Munos Hilbert J. Kappen

We consider the problem of learning the optimal action-value function in the discountedreward Markov decision processes (MDPs). We prove a new PAC bound on the samplecomplexity of model-based value iteration algorithm in the presence of a generative model of the MDP, which indicates that for an MDP with N state-action pairs and the discount factor γ ∈ [0, 1) only O ( N log(N/δ)/ ( (1− γ)ε )) sa...

متن کامل

Action Selection and Action Value in Frontal-Striatal Circuits

Journal: :Neuron 2012

متن کامل

Consensus, communication and knowledge: An extension with Bayesian agents

Journal: :Mathematical Social Sciences 2006

Lucie Ménager

Parikh and Krasucki [1990] showed that pairwise communication of the value of a function f leads to a consensus about the communicated value if the function f is convex. They showed that union consistency of f may not be sufficient to guarantee consensus in any communication protocol. Krasucki [1996] proved that consensus occurs for any union consistent function if the protocol contains no cycl...

متن کامل

Guided exploration in gradient based policy search with Gaussian processes

2011

Hunor S. Jakab

Applying reinforcement learning(RL) algorithms in robotic control proves to be challenging even in simple settings with a small number of states and actions. Value function based RL algorithms require the discretization of the state and action space, a limitation that is not acceptable in robotic control. The necessity to be able to deal with continuous state-action spaces led to the use of dif...

متن کامل

Q-Networks for Binary Vector Actions

Journal: :CoRR 2015

Naoto Yoshida

In this paper reinforcement learning with binary vector actions was investigated. We suggest an effective architecture of the neural networks for approximating an action-value function with binary vector actions. The proposed architecture approximates the action-value function by a linear function with respect to the action vector, but is still non-linear with respect to the state input. We sho...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید