Explorer Bayesian Policy Reuse

نویسندگان

Benjamin Rosman

Majd Hawasly

Subramanian Ramamoorthy

چکیده

A long-lived autonomous agent should be able to respond online to novel instances of tasks from a familiar domain. Acting online requires ‘fast’ responses, in terms of rapid convergence, especially when the task instance has a short duration such as in applications involving interactions with humans. These requirements can be problematic for many established methods for learning to act. In domains where the agent knows that the task instance is drawn from a family of related tasks, albeit without access to the label of any given instance, it can choose to act through a process of policy reuse from a library in contrast to policy learning. In policy reuse, the agent has prior experience from the class of tasks in the form of a library of policies that were learnt from sample task instances during an offline training phase. We formalise the problem of policy reuse and present an algorithm for efficiently responding to a novel task instance by reusing a policy from this library of existing policies, where the choice is based on observed ‘signals’ which correlate to policy performance. We achieve this by posing the problem as a Bayesian choice problem with a corresponding notion of an optimal response, but the computation of that response is in many cases intractable. Therefore, to reduce the computation cost of the posterior, we follow a Bayesian optimisation approach and define a set of policy selection functions, which balance exploration in the policy library against exploitation of previously tried policies, together with a model of expected performance of the policy library on their corresponding task instances. We validate our method in several simulated domains of interactive, short-duration episodic tasks, showing rapid convergence in unknown task variations. ? The first two authors contributed equally to this paper. Benjamin Rosman Mobile Intelligent Autonomous Systems (MIAS), Council for Scientific and Industrial Research (CSIR), South Africa, and the School of Computer Science and Applied Mathematics, University of the Witwatersrand, South Africa. E-mail: [email protected]. Majd Hawasly School of Informatics, University of Edinburgh, UK. E-mail: [email protected]. Subramanian Ramamoorthy School of Informatics, University of Edinburgh, UK. E-mail: [email protected]. ar X iv :1 50 5. 00 28 4v 2 [ cs .A I] 1 4 D ec 2 01 5 2 Rosman, Hawasly & Ramamoorthy

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Policy

متن کامل

A Bayesian Approach for Learning and Tracking Switching, Non-Stationary Opponents

In many situations, agents are required to use a set of strategies (behaviors) and switch among them during the course of an interaction. This work focuses on the problem of recognizing the strategy used by an agent within a small number of interactions. We propose using a Bayesian framework to address this problem. In this paper we extend Bayesian Policy Reuse to adversarial settings where opp...

متن کامل

A Bayesian Approach for Learning and Tracking Switching, Non-Stationary Opponents: (Extended Abstract)

متن کامل

Identifying and Tracking Switching, Non-Stationary Opponents: A Bayesian Approach

In many situations, agents are required to use a set of strategies (behaviors) and switch among them during the course of an interaction. This work focuses on the problem of recognizing the strategy used by an agent within a small number of interactions. We propose using a Bayesian framework to address this problem. Bayesian policy reuse (BPR) has been empirically shown to be efficient at corre...

متن کامل

Enabling Client-Side Crash-Resistance to Overcome Diversification and Information Hiding

It is a well-known issue that attack primitives which exploit memory corruption vulnerabilities can abuse the ability of processes to automatically restart upon termination. For example, network services like FTP and HTTP servers are typically restarted in case a crash happens and this can be used to defeat Address Space Layout Randomization (ASLR). Furthermore, recently several techniques evol...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Explorer Bayesian Policy Reuse

نویسندگان

چکیده

منابع مشابه

Bayesian Policy

A Bayesian Approach for Learning and Tracking Switching, Non-Stationary Opponents

A Bayesian Approach for Learning and Tracking Switching, Non-Stationary Opponents: (Extended Abstract)

Identifying and Tracking Switching, Non-Stationary Opponents: A Bayesian Approach

Enabling Client-Side Crash-Resistance to Overcome Diversification and Information Hiding

عنوان ژورنال:

اشتراک گذاری