Index Policies for Discounted Bandit Problems with Availability Constraints
نویسندگان
چکیده
A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. TheWhittle index policy is derived, and its properties are studied. Then it is assumed that the armsmay break down, but repair is an option at some cost, and the newWhittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.
منابع مشابه
Optimal Index Policies for Mdps with a Constraint
Many controlled queueing systems possess simple index-type optimal policies, when discounted, average or finite-time cost criteria are considered. This structural results makes the computation of optimal policies relatively simple. Unfortunately, for constrained optimization problems, the index structure of the optimal policies is in general not preserved. As a result, computing optimal policie...
متن کاملIndex Policies for a Class of Discounted Restless Bandits
The paper concerns a class of discounted restless bandit problems which possess an indexability property. Conservation laws yield an expression for the reward suboptimality of a general policy. These results are utilised to study the closeness to optimality of an index policy for a special class of simple and natural dual speed restless bandits for which indexability is guaranteed. The strong p...
متن کاملComputing a Classic Index for Finite-Horizon Bandits
T paper considers the efficient exact computation of the counterpart of the Gittins index for a finitehorizon discrete-state bandit, which measures for each initial state the average productivity, given by the maximum ratio of expected total discounted reward earned to expected total discounted time expended that can be achieved through a number of successive plays stopping by the given horizon...
متن کاملRestless Bandit Marginal Productivity Indices, Diminishing Returns, and Optimal Control of Make-to-Order/Make-to-Stock M/G/1 Queues
This paper presents a framework grounded on convex optimization and economics ideas to solve by index policies problems of optimal dynamic allocation of effort to a discrete-state (finite or countable) binary-action (work/rest) semi-Markov restless bandit project, elucidating issues raised by previous work. Its contributions include: (i) the concept of a restless bandit’s marginal productivity ...
متن کاملIndexability of Restless Bandit Problems and Optimality of Index Policies for Dynamic Multichannel Access
We consider an opportunistic communication system consisting of multiple independent channels with time-varying states. With limited sensing, a user can only sense and access a subset of channels and accrue rewards determined by the states of the sensed channels. We formulate the problem of optimal sequential channel selection as a restless multi-armed bandit process. We establish the indexabil...
متن کامل