نتایج جستجو برای: action value function

تعداد نتایج: 2342819  

Journal: :CoRR 2013
Sergey Rodionov Alexey Potapov Yurii Vinogradov

Optimal probabilistic approach in reinforcement learning is computationally infeasible. Its simplification consisting in neglecting difference between true environment and its model estimated using limited number of observations causes exploration vs exploitation problem. Uncertainty can be expressed in terms of a probability distribution over the space of environment models, and this uncertain...

2012
Vivek Ramavajjala Charles Elkan

This paper investigates a reinforcement learning method that combines learning a model of the environment with least-squares policy iteration (LSPI). The LSPI algorithm learns a linear approximation of the optimal stateaction value function; the idea studied here is to let this value function depend on a learned estimate of the expected next state instead of directly on the current state and ac...

1994
Minoru Asada Eiji Uchibe Shoichi Noda Sukoya Tawaratsumida Koh Hosoda

A method is proposed which accomplishes a whole task consisting of plural subtasks by coordinating multiple behaviors acquired by a vision-based reinforcement learning. First, individual behaviors which achieve the corresponding subtasks are independently acquired by Q-learning, a widely used reinforcement learning method. Each learned behavior can be represented by an action-value function in ...

2001
Gregory Z. Grudic Lyle H. Ungar

We address two open theoretical questions in Policy Gradient Reinforcement Learning. The first concerns the efficacy of using function approximation to represent the state action value function, Q. Theory is presented showing that linear function approximation representations of Q can degrade the rate of convergence of performance gradient estimates by a factor of O(ML) relative to when no func...

Journal: :CoRR 2014
Ichiroh Kanaya Mayuko Kanazawa Masataka Imura

Ichiroh Kanaya, Mayuko Kanazawa, Masataka Imura! ! This article presents the mathematical background of general interactive systems. The first principle of designing a large system is to “divide and conquer”, which implies that we could possibly reduce human error if we divided a large system in smaller subsystems. Interactive systems are, however, often composed of many subsystems that are “or...

Journal: :The American Journal of Comparative Law 2002

Journal: :The Journal of Thoracic and Cardiovascular Surgery 2018

Journal: :Journal of the Philosophy of Sport 2019

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید