نتایج جستجو برای: temporal difference learning

تعداد نتایج: 1222164  

Journal: :CoRR 2015
Ashique Rupam Mahmood Huizhen Yu Martha White Richard S. Sutton

Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps. Recent works by Sutton, Mahmood and White (2015), and Yu (2015) show that by varying the emphasis in a particular way, these algorithms become stable and convergent under off-policy training with linea...

2014
William Dabney Philip S. Thomas

In this paper we investigate the application of natural gradient descent to Bellman error based reinforcement learning algorithms. This combination is interesting because natural gradient descent is invariant to the parameterization of the value function. This invariance property means that natural gradient descent adapts its update directions to correct for poorly conditioned representations. ...

پایان نامه :دانشگاه آزاد اسلامی - دانشگاه آزاد اسلامی واحد شاهرود - دانشکده علوم انسانی 1393

the present study was conducted to investigate the role of summary in teaching grammar process on pre-university students. to this end, 48 male pre-university students in semnan were participants. they were math class that we called experimental group and science class as control group. in order to make sure there was no significant difference in proficiency level of two groups. a pre-test wa...

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه سمنان 1392

in the area of vocabulary teaching and learning although much research has been done, only some of it has led to effective techniques of vocabulary teaching and many language learners still have problem learning vocabulary. the urge behind this study was to investigate three methods of teaching words. the first one was teaching words in context based on a traditional method of teaching that is,...

1999
Martin C. Smith

...................................................................................................................................................1 TABLE OF CONTENTS............................................................................................................................ iii LIST OF TABLES...........................................................................................

2011
Dylan A. Simon Nathaniel D. Daw

There is much evidence that humans and other animals utilize a combination of model-based and model-free RL methods. Although it has been proposed that these systems may dominate according to their relative statistical efficiency in different circumstances, there is little specific evidence — especially in humans — as to the details of this trade-off. Accordingly, we examine the relative perfor...

2014
Kimberly L. Stachenfeld Matthew M. Botvinick Samuel J. Gershman

The SR-based critic learns an estimate of the value function, using the SR as its feature representation. Unlike standard actor-critic methods, the critic does not use reward-based temporal difference errors to update its value estimate; instead, it relies on the fact that the value function is given by V (s) = ∑ s′ M(s, s ′)R(s′), where M is the successor representation andR is the expected re...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید