action value function

نتایج جستجو برای: action value function

تعداد نتایج: 2342819 فیلتر نتایج به سال:

Direct Uncertainty Estimation in Reinforcement Learning

Journal: :CoRR 2013

Sergey Rodionov Alexey Potapov Yurii Vinogradov

Optimal probabilistic approach in reinforcement learning is computationally infeasible. Its simplification consisting in neglecting difference between true environment and its model estimated using limited number of observations causes exploration vs exploitation problem. Uncertainty can be expressed in terms of a probability distribution over the space of environment models, and this uncertain...

متن کامل

Policy Iteration Based on a Learned Transition Model

2012

Vivek Ramavajjala Charles Elkan

This paper investigates a reinforcement learning method that combines learning a model of the environment with least-squares policy iteration (LSPI). The LSPI algorithm learns a linear approximation of the optimal stateaction value function; the idea studied here is to let this value function depend on a learned estimate of the expected next state instead of directly on the current state and ac...

متن کامل

Coordination of multiple behaviors acquired by a vision-based reinforcement learning

1994

Minoru Asada Eiji Uchibe Shoichi Noda Sukoya Tawaratsumida Koh Hosoda

A method is proposed which accomplishes a whole task consisting of plural subtasks by coordinating multiple behaviors acquired by a vision-based reinforcement learning. First, individual behaviors which achieve the corresponding subtasks are independently acquired by Q-learning, a widely used reinforcement learning method. Each learned behavior can be represented by an action-value function in ...

متن کامل

Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

2001

Gregory Z. Grudic Lyle H. Ungar

We address two open theoretical questions in Policy Gradient Reinforcement Learning. The first concerns the efficacy of using function approximation to represent the state action value function, Q. Theory is presented showing that linear function approximation representations of Q can degrade the rate of convergence of performance gradient estimates by a factor of O(ML) relative to when no func...

متن کامل

بررسی مسایل مقدار اولیه - مرزی شامل معادلات دیفرانسیل روی بازه های زمانی در حالت خودالحاق و ناخودالحاق

پایان نامه :دانشگاه تربیت معلم - تبریز - دانشکده علوم 1387

اصغر احمدخانلو, محمد جهانشاهی, جعفر پورمحمود,

چکیده ندارد.

15 صفحه اول

Function + Action = Interaction

Journal: :CoRR 2014

Ichiroh Kanaya Mayuko Kanazawa Masataka Imura

Ichiroh Kanaya, Mayuko Kanazawa, Masataka Imura! ! This article presents the mathematical background of general interactive systems. The first principle of designing a large system is to “divide and conquer”, which implies that we could possibly reduce human error if we divided a large system in smaller subsystems. Interactive systems are, however, often composed of many subsystems that are “or...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید

Direct Uncertainty Estimation in Reinforcement Learning

Policy Iteration Based on a Learned Transition Model

Coordination of multiple behaviors acquired by a vision-based reinforcement learning

Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

بررسی مسایل مقدار اولیه - مرزی شامل معادلات دیفرانسیل روی بازه های زمانی در حالت خودالحاق و ناخودالحاق

Function + Action = Interaction

Fact, Value and Action in Nonconceptual Jurisprudence

Seeking the greatest value of our action

Action theory and the value of sport

The Action and Relative Value of Disinfectants