UDC 519.244.3 An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
نویسنده
چکیده
The asymptotic minimax theorem for Bernoulli two-armed bandit problem states that minimax risk has the order N as N → ∞, where N is the control horizon, and provides the estimates of the factor. For Gaussian twoarmed bandit with unit variances of one-step incomes and close expectations, we improve the asymptotic minimax theorem as follows: the minimax risk is approximately equal to 0.637N as N → ∞.
منابع مشابه
Contributions to the Asymptotic Minimax Theorem for the Two-Armed Bandit Problem
The asymptotic minimax theorem for Bernoully twoarmed bandit problem states that the minimax risk has the order N as N → ∞, where N is the control horizon, and provides lower and upper estimates. It can be easily extended to normal two-armed bandit. For normal two-armed bandit, we generalize the asymptotic minimax theorem as follows: the minimax risk is approximately equal to 0.637N as N →∞. Ke...
متن کاملMinimax Lower Bounds for the Two-Armed Bandit Problem
We obtain minimax lower bounds on the regret for the classical two-armed bandit problem. We provide a nite-sample minimax version of the well-known log n asymptotic lower bound of Lai and Robbins. Also, in contrast to the logn asymptotic results on the regret, we show that the minimax regret is achieved by mere random guessing under fairly mild conditions on the set of allowable con gurations o...
متن کاملOn Explore-Then-Commit strategies
We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known. Besides the main mess...
متن کاملFinite-time lower bounds for the two-armed bandit problem
We obtain minimax lower bounds on the regret for the classical two-armed bandit problem. We provide a finite-sample minimax version of the well-known log asymptotic lower bound of Lai and Robbins. The finite-time lower bound allows us to derive conditions for the amount of time necessary to make any significant gain over a random guessing strategy. These bounds depend on the class of possible d...
متن کاملBandit Algorithms in Game Tree Search: Application to Computer Renju∗
Multi-armed bandit problem is to maximize a cumulated reward by playing arms sequentially without prior knowledge. Algorithms for this problem such as UCT have been successfully extended to computer GO programs and proved significantly effective by defeating professional players. The goal of the project is to implement a Renju AI based on Monte Carlo planning that is able to defeat the oldest k...
متن کامل