Automatically Mapped Transfer Between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines1

نویسندگان

  • Haitham Bou Ammar
  • Decebal Constantin Mocanu
  • Matthew E. Taylor
  • Kurt Driessens
  • Gerhard Weiss
  • Karl Tuyls
چکیده

Reinforcement learning (RL) has become a popular framework for autonomous behaviour generation from limited feedback [2, 3], but RL methods typically learn tabula rasa. Transfer learning (TL) aims to improve learning by providing informative knowledge from a previous (source) task or tasks to a learning agent in a novel (target) task. If the agent is to be fully autonomous, it must: (1) automatically select a source task, (2) learn how the source task and target tasks are related, and (3) effectively use transferred knowledge when in the target task. While fully autonomous transfer is not yet possible, this paper advances the state of the art by focusing on part (2). In particular, this work proposes methods to automatically learn the relationships between pairs of tasks and then use this learned relationship to transfer effective knowledge. In TL for RL, the source task and target task may differ in their formulations. In particular, when the source task and target task have different state and/or action spaces, an inter-task mapping [4] that describes the relationship between the two tasks is needed. While there have been attempts to discover this mapping automatically, finding an optimal way to construct this mapping is still an open question. Existing techniques either rely on restrictive assumptions made about the relationship between the source and target tasks or adopt heuristics that work only in specific cases. This paper introduces an autonomous framework for learning inter-task mappings based on restricted Boltzmann machines (RBMs) [1]. RBMs provide a powerful but general framework that can be used to describe an abstract common space for different tasks. This common space is then used in turn to represent the inter-task mapping between the two tasks and to transfer knowledge about transition dynamics between the two tasks. The contributions of this paper are summarised as follows. First, a novel RBM is proposed that uses a three-way weight tensor (i.e., TrRBM). Since this machine has a computational complexity of O(N), a factored version (i.e., FTrRBM) is then derived that reduces the complexity to O(N). Learning in this factored version can not be done with vanilla Contrastive Divergence (CD). The main reason is that if CD divergence was used as is, FTrRBM will learn to correlate random samples from the source task to random samples in the target. To tackle this problem, as well as ensure computational efficiency, a modified version of CD is proposed. In Parallel Contrastive Divergence (PCD), the data sets are first split into batches of samples. Parallel Markov chains run to a certain number of steps on each batch. At each step of the chain, the values of the derivatives are calculated and averaged to perform a learning step. This runs for a certain number of epochs. At the second iteration the same procedure is followed but with randomised samples in each of the batches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatically Mapped Transfer between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines

Reinforcement learning applications are hampered by the tabula rasa approach taken by existing techniques. Transfer for reinforcement learning tackles this problem by enabling the reuse of previously learned results, but requires an inter-task mapping to encode how the previously learned task and the new task are related. This paper presents an autonomous framework for learning inter-task mappi...

متن کامل

An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning

Transfer learning can improve the reinforcement learning of a new task by allowing the agent to reuse knowledge acquired from other source tasks. Despite their success, transfer learning methods rely on having relevant source tasks; transfer from inappropriate tasks can inhibit performance on the new task. For fully autonomous transfer, it is critical to have a method for automatically choosing...

متن کامل

Heterogeneous Transfer Learning with RBMs

A common approach in machine learning is to use a large amount of labeled data to train a model. Usually this model can then only be used to classify data in the same feature space. However, labeled data is often expensive to obtain. A number of strategies have been developed by the machine learning community in recent years to address this problem, including: semi-supervised learning, domain a...

متن کامل

Reinforcement learning using quantum Boltzmann machines

We investigate whether quantum annealers with select chip layouts can outperform classical computers in reinforcement learning tasks. We associate a transverse field Ising spin Hamiltonian with a layout of qubits similar to that of a deep Boltzmann machine (DBM) and use simulated quantum annealing (SQA) to numerically simulate quantum sampling from this system. We design a reinforcement learnin...

متن کامل

Generative Acoustic-Phonemic-Speaker Model Based on Three-Way Restricted Boltzmann Machine

In this paper, we argue the way of modeling speech signals based on three-way restricted Boltzmann machine (3WRBM) for separating phonetic-related information and speaker-related information from an observed signal automatically. The proposed model is an energy-based probabilistic model that includes three-way potentials of three variables: acoustic features, latent phonetic features, and speak...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013