Mistake Bounds on Noise-Free Multi-Armed Bandit Game

نویسندگان

  • Atsuyoshi Nakamura
  • David P. Helmbold
  • Manfred K. Warmuth
چکیده

We study the {0, 1}-loss version of adaptive adversarial multi-armed bandit problems with α(≥ 1) lossless arms. For the problem, we show a tight bound K − α − Θ(1/T ) on the minimax expected number of mistakes (1-losses), where K is the number of arms and T is the number of rounds.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Noise Free Multi-armed Bandit Game

We study the loss version of adversarial multi-armed bandit problems with one lossless arm. We show an adversary’s strategy that forces any player to suffer K − 1− O(1/T ) loss where K is the number of arms and T is the number of rounds.

متن کامل

Gap-free Bounds for Stochastic Multi-Armed Bandit

We consider the stochastic multi-armed bandit problem with unknown horizon. We present a randomized decision strategy which is based on updating a probability distribution through a stochastic mirror descent type algorithm. We consider separately two assumptions: nonnegative losses or arbitrary losses with an exponential moment condition. We prove optimal (up to logarithmic factors) gap-free bo...

متن کامل

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

This paper is about two generalizations of the mistake bound model to online multiclass classification. In the standard model, the learner receives the correct classification at the end of each round, and in the bandit model, the learner only finds out whether its prediction was correct or not. For a set F of multiclass classifiers, let optstd(F ) and optbandit(F ) be the optimal bounds for lea...

متن کامل

Anytime optimal algorithms in stochastic multi-armed bandits

We introduce an anytime algorithm for stochastic multi-armed bandit with optimal distribution free and distribution dependent bounds (for a specific family of parameters). The performances of this algorithm (as well as another one motivated by the conjectured optimal bound) are evaluated empirically. A similar analysis is provided with full information, to serve as a benchmark.

متن کامل

The Price of Differential Privacy for Online Learning

We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal Õ( √ T ) regret bounds. In the full-information setting, our results demonstrate that ε-differential privacy may be ensured for free – in particular, the regret bounds scale as O( √ T ) + Õ ( 1 ε ) . For bandit linear optimization, and as a special c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017