Optimistic Gittins Indices

نویسندگان

  • Eli Gutin
  • Vivek F. Farias
چکیده

Starting with the Thomspon sampling algorithm, recent years have seen a resurgence of interest in Bayesian algorithms for the Multi-armed Bandit (MAB) problem. These algorithms seek to exploit prior information on arm biases and while several have been shown to be regret optimal, their design has not emerged from a principled approach. In contrast, if one cared about Bayesian regret discounted over an infinite horizon at a fixed, pre-specified rate, the celebrated Gittins index theorem offers an optimal algorithm. Unfortunately, the Gittins analysis does not appear to carry over to minimizing Bayesian regret over all sufficiently large horizons and computing a Gittins index is onerous relative to essentially any incumbent index scheme for the Bayesian MAB problem. The present paper proposes a sequence of ‘optimistic’ approximations to the Gittins index. We show that the use of these approximations in concert with the use of an increasing discount factor appears to offer a compelling alternative to state-of-the-art index schemes proposed for the Bayesian MAB problem in recent years by offering substantially improved performance with little to no additional computational overhead. In addition, we prove that the simplest of these approximations yields frequentist regret that matches the Lai-Robbins lower bound, including achieving matching constants.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Explicit Gittins Indices for a Class of Superdiffusive Processes

We explicitly calculate the dynamic allocation indices (i.e. the Gittins indices) for multiarmed Bandit processes driven by superdiffusive noise sources. This class of model generalizes former results derived by Karatzas for diffusive processes. In particular, the Gittins indices do, in this soluble class of superdiffusive models, explicitly depend on the noise state.

متن کامل

Q-Learning for Bandit Problems

Multi-armed bandits may be viewed as decompositionally-structured Markov decision processes (MDP's) with potentially very large state sets. A particularly elegant methodology for computing optimal policies was developed over twenty ago by Gittins Gittins & Jones, 1974]. Gittins' approach reduces the problem of nding optimal policies for the original MDP to a sequence of low-dimensional stopping...

متن کامل

On the optimality of the Gittins index rule for multi-armed bandits with multiple plays

We investigate the general multi-armed bandit problem with multiple servers. We determine a condition on the reward processes su1⁄2cient to guarantee the optimality of the strategy that operates at each instant of time the projects with the highest Gittins indices. We call this strategy the Gittins index rule for multi-armed bandits with multiple plays, or brie ̄y the Gittins index rule. We show...

متن کامل

Optimal Stopping and Gittins' Indices for Piecewise Deterministic Evolution Processes

An optimal stopping problem involving a piecewise determinis-tic evolution processes is explicitly solved using the method of quasi-variational inequalities. The explicit solution derived ooer the possibility to explicitly discuss the associated dynamic allocation problems by means of the Gittins indices.

متن کامل

Efficient Dynamic Allocation with Uncertain Valuations∗

In this paper we consider the problem of efficiently allocating a given resource or object repeatedly over time. The agents, who may temporarily receive access to the resource, learn more about its value through its use. When the agents’ beliefs about their valuations at any given time are public information, this problem reduces to the classic multi-armed bandit problem, the solution to which ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016