نتایج جستجو برای: action value function

تعداد نتایج: 2342819  

2011
Jason Pazis Ronald Parr

The majority of value function approximation based reinforcement learning algorithms available today, focus on approximating the state (V) or state-action (Q) value function and efficient action selection comes as an afterthought. On the other hand, real-world problems tend to have large action spaces, where evaluating every possible action becomes impractical. This mismatch presents a major ob...

2012
Asma Al-tamimi Murad Abu-Khalaf Frank Lewis

This chapter is concerned with the application of approximate dynamic programming techniques (ADP) to solve for the value function, and hence the optimal control policy, in discrete-time nonlinear optimal control problems having continuous state and action spaces. ADP is a reinforcement learning approach (Sutton & Barto, 1998) based on adaptive critics (Barto et al., 1983), (Widrow et al., 1973...

Journal: :Proceedings of the ... AAAI Conference on Artificial Intelligence 2023

We present Q-functionals, an alternative architecture for continuous control deep reinforcement learning. Instead of returning a single value state-action pair, our network transforms state into function that can be rapidly evaluated in parallel many actions, allowing us to efficiently choose high-value actions through sampling. This contrasts with the typical off-policy control, where policy i...

2009
Alex Grubb Stephane Ross Felix Duvallet

Our approach to this problem was to use reinforcement learning with a function approximator to approximate the state value function [RSS98]. In our case, a +1 reward was given for every completed line, so that the value function would encode the long-term number of lines that is going to be completed by the algorithm. In order to achieve this, we extract features from the game state, and use gr...

2012
Mohammad Gheshlaghi Azar Rémi Munos Hilbert J. Kappen

We consider the problem of learning the optimal action-value function in the discountedreward Markov decision processes (MDPs). We prove a new PAC bound on the samplecomplexity of model-based value iteration algorithm in the presence of a generative model of the MDP, which indicates that for an MDP with N state-action pairs and the discount factor γ ∈ [0, 1) only O ( N log(N/δ)/ ( (1− γ)ε )) sa...

Journal: :Mathematical Social Sciences 2006
Lucie Ménager

Parikh and Krasucki [1990] showed that pairwise communication of the value of a function f leads to a consensus about the communicated value if the function f is convex. They showed that union consistency of f may not be sufficient to guarantee consensus in any communication protocol. Krasucki [1996] proved that consensus occurs for any union consistent function if the protocol contains no cycl...

2011
Hunor S. Jakab

Applying reinforcement learning(RL) algorithms in robotic control proves to be challenging even in simple settings with a small number of states and actions. Value function based RL algorithms require the discretization of the state and action space, a limitation that is not acceptable in robotic control. The necessity to be able to deal with continuous state-action spaces led to the use of dif...

Journal: :CoRR 2015
Naoto Yoshida

In this paper reinforcement learning with binary vector actions was investigated. We suggest an effective architecture of the neural networks for approximating an action-value function with binary vector actions. The proposed architecture approximates the action-value function by a linear function with respect to the action vector, but is still non-linear with respect to the state input. We sho...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید