Plans as a Means for Guiding a Reinforcement Learner

نویسنده

  • Jens Pfau
چکیده

The complexity of reinforcement learning problems grows exponentially with the size of the state space, which renders realistic cases unsolvable and underlines the need for guidance. This thesis studies a hybrid agent architecture, in which the toplevel module reuses temporal knowledge in the form of plans that it extracts from a concurrently executing low-level reinforcement learner. The first contribution of this work are significant improvements of the original model and implementation of the agent architecture, resulting in a more effective knowledge extraction and reuse. The second contribution is an extensive exploration of the synergy effects that take place between both layers of the architecture. It is shown that the combination of state abstraction and the reuse of plans as temporal abstraction can lead to a significantly shorter learning time of a reinforcement learning agent. Likewise, the number of decisions to be made by the agent is reduced because a plan is a definite commitment to a course of actions that does not require intermediary reasoning. In addition, we demonstrate that the architecture enables the integration of plans as prior knowledge through a clear and convenient interface. Thus, partial and approximate solutions to the problem can be easily specified to significantly decrease learning time even further. To my beloved wife Agnieszka Beata — for patience, support and encouragement throughout the course of this thesis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer

We describe our current efforts towards creating a reinforcement learner that learns both from reinforcements provided by its environment and from human-generated advice. Our research involves two complementary components: (a) mapping advice expressed in English to a formal advice language and (b) using advice expressed in a formal notation in a reinforcement learner. We use a subtask of the ch...

متن کامل

Ontology based learner-centered smart e-learning system

Rapid changes in learning contents and variability of the learners’ background can make e-learning systems inefficient due to their inflexibility in coping with such factors changes and variability. In this paper, we are proposing an ontological approach in dealing with such issues. Our approach the Ontology based Learner-centered Smart E-Learning System (OLSES) allows instructors and learners ...

متن کامل

Moral Analysis of “Teacher's Authority” Based on the “Right-centeredness” Index in Imam Sajjad's Legal Treatise

In response to what the teacher’s authority is in teaching from a moral point of view, Imam Sajjad's stance is helpful. Descriptive-analytical study of Imam Sajjad's view in the Legal Treatise shows that the type of relationship between the teacher and the learner due to the characteristics of scientific productivity, guiding and spiritual value in relation to the learner has a special place of...

متن کامل

Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression

We present a novel formulation for providing advice to a reinforcement learner that employs supportvector regression as its function approximator. Our new method extends a recent advice-giving technique, called Knowledge-Based Kernel Regression (KBKR), that accepts advice concerning a single action of a reinforcement learner. In KBKR, users can say that in some set of states, an action’s value ...

متن کامل

Potential-Based Shaping and Q-Value Initialization are Equivalent

Shaping has proven to be a powerful but precarious means of improving reinforcement learning performance. Ng, Harada, and Russell (1999) proposed the potential-based shaping algorithm for adding shaping rewards in a way that guarantees the learner will learn optimal behavior. In this note, we prove certain similarities between this shaping algorithm and the initialization step required for seve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008