Arbitrary Side Observations in Bandit Problems
نویسندگان
چکیده
A bandit problem with side observations is an extension of the traditional two-armed bandit problem, in which the decision maker has access to side information before deciding which arm to pull. In this paper, essential properties of the side observations that allow achievability results with respect to optimal regret are extracted and formalized. The sufficient conditions for good side information obtained here admit various types of random processes as special cases, including i.i.d. sequences, Markov chains, deterministic periodic sequences, etc. A simple necessary condition for optimal regret is given, providing further insight into the nature of bandit problems with side observations. A game-theoretic approach simplifies the analysis and justifies the viewpoint that the side observation serves as an index specifying different sub-bandit machines. 2004 Elsevier Inc. All rights reserved.
منابع مشابه
Bandit Problems with Arbitrary Side Observations
A bandit problem with side observations is an extension of the traditional two-armed bandit problem, in which the decision maker has access to side information before deciding which arm to pull. In this paper, the essential properties of the side observations that allow achievability results with respect to the minimal inferior sampling time are extracted and formulated. The sufficient conditio...
متن کاملPerformance Limitations in Bandit Problems with Side Observations
We consider a sequential adaptive allocation problem which is formulated as a traditional two armed bandit problem but with one important modification: at each time step t, before selecting which arm to pull, the decision maker has access to a random variable Xt which provides information on the reward in each arm. Performance is measured as the fraction of times an inferior arm (generating low...
متن کاملEfficient learning by implicit exploration in bandit problems with side observations
We consider online learning problems under a a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition to its own loss, the learner also gets to observe losses of some other actions. The revealed losses depend on the learner’s action and a directed observat...
متن کاملComplexity Constraints in Two - Armed Bandit Problems : An Example
This paper derives the optimal strategy for a two armed bandit problem under the constraint that the strategy must be implemented by a finite automaton with an exogenously given, small number of states. The idea is to find learning rules for bandit problems that are optimal subject to the constraint that they must be simple. Our main results show that the optimal rule involves an arbitrary init...
متن کاملFractional Moments on Bandit Problems
Reinforcement learning addresses the dilemma between exploration to find profitable actions and exploitation to act according to the best observations already made. Bandit problems are one such class of problems in stateless environments that represent this explore/exploit situation. We propose a learning algorithm for bandit problems based on fractional expectation of rewards acquired. The alg...
متن کامل