نتایج جستجو برای: action value function
تعداد نتایج: 2342819 فیلتر نتایج به سال:
The majority of value function approximation based reinforcement learning algorithms available today, focus on approximating the state (V) or state-action (Q) value function and efficient action selection comes as an afterthought. On the other hand, real-world problems tend to have large action spaces, where evaluating every possible action becomes impractical. This mismatch presents a major ob...
This chapter is concerned with the application of approximate dynamic programming techniques (ADP) to solve for the value function, and hence the optimal control policy, in discrete-time nonlinear optimal control problems having continuous state and action spaces. ADP is a reinforcement learning approach (Sutton & Barto, 1998) based on adaptive critics (Barto et al., 1983), (Widrow et al., 1973...
We present Q-functionals, an alternative architecture for continuous control deep reinforcement learning. Instead of returning a single value state-action pair, our network transforms state into function that can be rapidly evaluated in parallel many actions, allowing us to efficiently choose high-value actions through sampling. This contrasts with the typical off-policy control, where policy i...
Our approach to this problem was to use reinforcement learning with a function approximator to approximate the state value function [RSS98]. In our case, a +1 reward was given for every completed line, so that the value function would encode the long-term number of lines that is going to be completed by the algorithm. In order to achieve this, we extract features from the game state, and use gr...
We consider the problem of learning the optimal action-value function in the discountedreward Markov decision processes (MDPs). We prove a new PAC bound on the samplecomplexity of model-based value iteration algorithm in the presence of a generative model of the MDP, which indicates that for an MDP with N state-action pairs and the discount factor γ ∈ [0, 1) only O ( N log(N/δ)/ ( (1− γ)ε )) sa...
Parikh and Krasucki [1990] showed that pairwise communication of the value of a function f leads to a consensus about the communicated value if the function f is convex. They showed that union consistency of f may not be sufficient to guarantee consensus in any communication protocol. Krasucki [1996] proved that consensus occurs for any union consistent function if the protocol contains no cycl...
Applying reinforcement learning(RL) algorithms in robotic control proves to be challenging even in simple settings with a small number of states and actions. Value function based RL algorithms require the discretization of the state and action space, a limitation that is not acceptable in robotic control. The necessity to be able to deal with continuous state-action spaces led to the use of dif...
In this paper reinforcement learning with binary vector actions was investigated. We suggest an effective architecture of the neural networks for approximating an action-value function with binary vector actions. The proposed architecture approximates the action-value function by a linear function with respect to the action vector, but is still non-linear with respect to the state input. We sho...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید