Opposition-Based Q(λ) with Non-Markovian Update

نویسندگان

Maryam Shokri

Hamid R. Tizhoosh

Mohamed S. Kamel

چکیده

The OQ(λ) algorithm benefits from an extension of eligibility traces introduced as opposition trace. This new technique is a combination of the idea of opposition and eligibility traces to deal with large state space problems in reinforcement learning applications. In our previous works the comparison of the results of OQ(λ) and conventional Watkins’ Q(λ) reflected a remarkable increase in performance for the OQ(λ) algorithm. However the Markovian update of opposition traces is an issue which is investigated in this paper. It has been assumed that the opposite state can be presented to the agent. This may limit the usability of the technique to deterministic environments. In order to relax this assumption the NonMarkovian Opposition-Based Q(λ) (NOQ(λ)) is introduced in this work. The new method is a hybrid of Markovian update for eligibility traces and non-Markovian-based update for opposition traces. The experimental results show improvements of learning speed for the proposed technique compared to Q(λ) and OQ(λ). The new technique performs faster than OQ(λ) algorithm with the same success rate and can be employed for broader range of applications since it does not require determining state transition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-Markovian Agent Evolution with EVOLP

Logic Programming Update Languages were proposed as an extension of logic programming, which allow for modelling the dynamics of knowledge bases where both extensional knowledge (facts) as well as intentional knowledge (rules) may change over time due to updates, with important application Multi-Agent Systems (MAS). Despite their generality, these languages do not provide means to directly acce...

متن کامل

Hq-learning: Discovering Markovian Subgoals for Non-markovian Reinforcement Learning

To solve partially observable Markov decision problems, we introduce HQ-learning, a hierarchical extension of Q-learning. HQ-learning is based on an ordered sequence of subagents, each learning to identify and solve a Markovian subtask of the total task. Each agent learns (1) an appropriate subgoal (though there is no intermediate, external reinforcement for \good" subgoals), and (2) a Markovia...

متن کامل

Two Novel On-policy Reinforcement Learning Algorithms based on TD(lambda)-methods

This paper describes two novel on-policy reinforcement learning algorithms, named QV(λ)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(λ)-methods. The difference between the algorithms is that QV-learning uses the learned value function and a form of Q-learning to learn Q-values, whereas ACLA uses the value function and a learning ...

متن کامل

Maximum Throughput

Appendix This appendix contains some hints on the calculation of the performance values mentioned in section 3.2. The probability that submitted queries (updates) can be executed at the local database is denoted as l q (l u respectively) and results in l q loc loc r qr = + − ⋅ ⋅ () 1 and l u loc = because loc expresses the preference of accessing original local data and the second term reflects...

متن کامل

Trajectory-Based Modified Policy Iteration

This paper presents a new problem solving approach that is able to generate optimal policy solution for finite-state stochastic sequential decision-making problems with high data efficiency. The proposed algorithm iteratively builds and improves an approximate Markov Decision Process (MDP) model along with cost-to-go value approximates by generating finite length trajectories through the state-...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Opposition-Based Q(λ) with Non-Markovian Update

نویسندگان

چکیده

منابع مشابه

Non-Markovian Agent Evolution with EVOLP

Hq-learning: Discovering Markovian Subgoals for Non-markovian Reinforcement Learning

Two Novel On-policy Reinforcement Learning Algorithms based on TD(lambda)-methods

Maximum Throughput

Trajectory-Based Modified Policy Iteration

عنوان ژورنال:

اشتراک گذاری