keywords reinforcement learning

Reinforcement Learning C3.3 Delayed reinforcement learning

1996

S Sathiya Keerthi B Ravindran

See the abstract for Chapter C3. Delayed reinforcement learning (RL) concerns the solution of stochastic optimal control problems. In this section we formulate and discuss the basics of such problems. Solution methods for delayed RL will be presented in Sections C3.4 and C3.5. In these three sections we will mainly consider problems in which C3.4, C3.5 the state and control spaces are finite se...

متن کامل

Reinforcement learning and neural reinforcement learning

1994

Samira Sehad Claude F. Touzet

In this paper, we address an under-represented class of learning algorithms in the study of connectionism: reinforcement learning. We first introduce these classic methods in a new formalism which highlights the particularities of implementations such as Q-Learning, QLearning with Hamming distance, Q-Learning with statistical clustering and Dyna-Q. We then present in this formalism a neural imp...

متن کامل

Reinforcement learning for multi-step problems

1992

David J. Finton Yu Hen Hu

In reinforcement learning for multi-step problems, the sparse nature of the feedback aggravates the difficulty of learning to perform. This paper explores the use of a reinforcement learning architecture, leading to a discussion of reinforcement learning in terms of feature abstraction, credit-assignment, and temporal-difference learning. Issues discussed include: the conditioning of the reinfo...

متن کامل

Learning Through Interaction

2010

Violeta Mirchevska Boštjan Kaluža

Reinforcement learning is an approach for learning optimal action policy via experiencing, i.e. using observed reward in environment states. Reinforcement learning algorithms include adaptive dynamic programming, temporal difference learning and Q-learning[1]. Examples of successful applications of reinforcement learning are controller for sustained inverted flight on an autonomous helicopter [...

متن کامل

the effect of consciousness-raising activities on learning grammatical structures by iranian guidance school efl learners

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه سیستان و بلوچستان - دانشکده ادبیات و علوم انسانی 1392

نورالله منصورزاده, فرخ لفا حیدری, اسماعیل نورمحمدی,

abstract the first purpose of this study was to investigate the effect of consciousness-raising (c-r) activities on learning grammatical structures (simple present tense in this case) by iranian guidance school efl learners. the second one was to investigate the effect of gender on learning the simple present tense through c-r activities and tasks. finally, this study aimed to investigate the ...

TD ( X ) Converges with Probability 1

1994

Richard Sutton

The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future. Sutton (1988) proved that for a special case of temporal differences, the expected values of the predictions co...

متن کامل

Sustainable ℓ2-regularized actor-critic based on recursive least-squares temporal difference learning

2017

Luntong Li Dazi Li Tianheng Song

Least-squares temporal difference learning (LSTD) has been used mainly for improving the data efficiency of the critic in actor-critic (AC). However, convergence analysis of the resulted algorithms is difficult when policy is changing. In this paper, a new AC method is proposed based on LSTD under discount criterion. The method comprises two components as the contribution: (1) LSTD works in an ...

متن کامل

Distributed Coverage Control by Robot Networks in Unknown Environments Using a Modified EM Algorithm

2017

Mohammadhosein Hasanbeig Lacra Pavel

In this paper, we study a distributed control algorithm for the problem of unknown area coverage by a network of robots. The coverage objective is to locate a set of targets in the area and to minimize the robots’ energy consumption. The robots have no prior knowledge about the location and also about the number of the targets in the area. One efficient approach that can be used to relax the ro...

متن کامل

The Effect of Electronical Media on the Reinforcement of Social Behavior of Youth from the Computer Course Professors and Students Viewpoints of Sari Islamic Azad University

Journal: مجله مطالعات جامعه شناختی جوانان 2011

Babak Hosseinzadeh, Hamid Fallah Jamal Sadeghi, Zobeydeh JAanbazi

The goal of research was the effect of electronical learning media on the reinforcement of youth social behavior from the point of view of computer course professors and students of Islamic Azad University of Sari. The statistical population was included of all computer students and professors of I.A.U of Sari. The statistical sample was identified by using of the sample content identification ...

متن کامل

Delayed Reinforcement, Fuzzy Q-Learning and Fuzzy Logic Controllers

1996

Andrea Bonarini

In this paper, we discuss situations arising with reinforcement learning algorithms, when the reinforcement is delayed. The decision to consider delayed reinforcement is typical in many applications, and we discuss some motivations for it. Then, we summarize Q-Learning, a popular algorithm to deal with delayed reinforcement, and its recent extensions to use it to learn fuzzy logic structures (F...

متن کامل