temporal difference learning

نتایج جستجو برای: temporal difference learning

تعداد نتایج: 1222164 فیلتر نتایج به سال:

Emphatic Temporal-Difference Learning

Journal: :CoRR 2015

Ashique Rupam Mahmood Huizhen Yu Martha White Richard S. Sutton

Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps. Recent works by Sutton, Mahmood and White (2015), and Yu (2015) show that by varying the emphasis in a particular way, these algorithms become stable and convergent under off-policy training with linea...

متن کامل

Natural Temporal Difference Learning

2014

William Dabney Philip S. Thomas

In this paper we investigate the application of natural gradient descent to Bellman error based reinforcement learning algorithms. This combination is interesting because natural gradient descent is invariant to the parameterization of the value function. This invariance property means that natural gradient descent adapts its update directions to correct for poorly conditioned representations. ...

متن کامل

[نقش خلاصه در فرآیند آموزش دستور زبان در دانش آموزان پیش دانشگاهی]

پایان نامه :دانشگاه آزاد اسلامی - دانشگاه آزاد اسلامی واحد شاهرود - دانشکده علوم انسانی 1393

ابوالفضل علی یاری, عباس مرادان,

the present study was conducted to investigate the role of summary in teaching grammar process on pre-university students. to this end, 48 male pre-university students in semnan were participants. they were math class that we called experimental group and science class as control group. in order to make sure there was no significant difference in proficiency level of two groups. a pre-test wa...

15 صفحه اول

Insights in reinforcement rearning : formal analysis and empirical evaluation of temporal-difference learning algorithms

2011

Hado P. van Hasselt

متن کامل

تقویت یادگیری لغت با ایجاد جاهای خالی در مجاورت کلمات هدف

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه سمنان 1392

ایمان علیزاده, هادی فرجامی, ارام رضا صادقی,

in the area of vocabulary teaching and learning although much research has been done, only some of it has led to effective techniques of vocabulary teaching and many language learners still have problem learning vocabulary. the urge behind this study was to investigate three methods of teaching words. the first one was teaching words in context based on a traditional method of teaching that is,...

Practical issues in temporal difference learning

Journal: :Machine Learning 1992

متن کامل

Temporal Difference Learning and TD-Gammon

Journal: :ICGA Journal 1995

متن کامل

Temporal difference learning in complex domains

1999

Martin C. Smith

...................................................................................................................................................1 TABLE OF CONTENTS............................................................................................................................ iii LIST OF TABLES...........................................................................................

متن کامل

Environmental statistics and the trade-off between model-based and TD learning in humans

2011

Dylan A. Simon Nathaniel D. Daw

There is much evidence that humans and other animals utilize a combination of model-based and model-free RL methods. Although it has been proposed that these systems may dominate according to their relative statistical efficiency in different circumstances, there is little specific evidence — especially in humans — as to the details of this trade-off. Accordingly, we examine the relative perfor...

متن کامل

Supplemental Information for Design Principles of the Hippocampal Cognitive Map

2014

Kimberly L. Stachenfeld Matthew M. Botvinick Samuel J. Gershman

The SR-based critic learns an estimate of the value function, using the SR as its feature representation. Unlike standard actor-critic methods, the critic does not use reward-based temporal difference errors to update its value estimate; instead, it relies on the fact that the value function is given by V (s) = ∑ s′ M(s, s ′)R(s′), where M is the successor representation andR is the expected re...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید