نتایج جستجو برای: action value function
تعداد نتایج: 2342819 فیلتر نتایج به سال:
Optimal probabilistic approach in reinforcement learning is computationally infeasible. Its simplification consisting in neglecting difference between true environment and its model estimated using limited number of observations causes exploration vs exploitation problem. Uncertainty can be expressed in terms of a probability distribution over the space of environment models, and this uncertain...
This paper investigates a reinforcement learning method that combines learning a model of the environment with least-squares policy iteration (LSPI). The LSPI algorithm learns a linear approximation of the optimal stateaction value function; the idea studied here is to let this value function depend on a learned estimate of the expected next state instead of directly on the current state and ac...
A method is proposed which accomplishes a whole task consisting of plural subtasks by coordinating multiple behaviors acquired by a vision-based reinforcement learning. First, individual behaviors which achieve the corresponding subtasks are independently acquired by Q-learning, a widely used reinforcement learning method. Each learned behavior can be represented by an action-value function in ...
We address two open theoretical questions in Policy Gradient Reinforcement Learning. The first concerns the efficacy of using function approximation to represent the state action value function, Q. Theory is presented showing that linear function approximation representations of Q can degrade the rate of convergence of performance gradient estimates by a factor of O(ML) relative to when no func...
چکیده ندارد.
Ichiroh Kanaya, Mayuko Kanazawa, Masataka Imura! ! This article presents the mathematical background of general interactive systems. The first principle of designing a large system is to “divide and conquer”, which implies that we could possibly reduce human error if we divided a large system in smaller subsystems. Interactive systems are, however, often composed of many subsystems that are “or...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید