Finite-time lower bounds for the two-armed bandit problem

نویسندگان

  • Sanjeev R. Kulkarni
  • Gábor Lugosi
چکیده

We obtain minimax lower bounds on the regret for the classical two-armed bandit problem. We provide a finite-sample minimax version of the well-known log asymptotic lower bound of Lai and Robbins. The finite-time lower bound allows us to derive conditions for the amount of time necessary to make any significant gain over a random guessing strategy. These bounds depend on the class of possible distributions of the rewards associated with the arms. For example, in contrast to the log asymptotic results on the regret, we show that the minimax regret is achieved by mere random guessing under fairly mild conditions on the set of allowable configurations of the two arms. That is, we show that for every allocation rule and for every , there is a configuration such that the regret at time is at least 1 times the regret of random guessing, where is any small positive constant.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of Burnetas and Katehakis (1996). Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of pre...

متن کامل

[inria-00574987, v2] A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of Burnetas and Katehakis (1996). Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of pre...

متن کامل

Bounded Regret for Finite-Armed Structured Bandits

We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms. We present a new algorithm for this general class of problems and show that under certain circumstances it is possible to achieve finite expected cumulative regret. We also give problemdependent lower bounds on the cumulative regret showing that at least in special cases t...

متن کامل

Nearly Tight Bounds for the Continuum-Armed Bandit Problem

In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when there is an infinite strategy set. Here we consider the case when the set of strategies is a subset...

متن کامل

Finite-Time Regret Bounds for the Multiarmed Bandit Problem

We show finite-time regret bounds for the multiarmed bandit problem under the assumption that all rewards come from a bounded and fixed range. Our regret bounds after any number T of pulls are of the form a+b logT+c log2 T , where a, b, and c are positive constants not depending on T . These bounds are shown to hold for variants of the popular "-greedy and Boltzmann allocation rules, and for a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Automat. Contr.

دوره 45  شماره 

صفحات  -

تاریخ انتشار 2000