Pure Exploration of Multi-armed Bandit Under Matroid Constraints

نویسندگان

  • Lijie Chen
  • Anupam Gupta
  • Jian Li
چکیده

We study the pure exploration problem subject to a matroid constraint (Best-Basis) in a stochastic multi-armed bandit game. In a Best-Basis instance, we are given n stochastic arms with unknown reward distributions, as well as a matroid M over the arms. Let the weight of an arm be the mean of its reward distribution. Our goal is to identify a basis of M with the maximum total weight, using as few samples as possible. The problem is a significant generalization of the best arm identification problem and the top-k arm identification problem, which have attracted significant attentions in recent years. We study both the exact and PAC versions of Best-Basis, and provide algorithms with nearlyoptimal sample complexities for these versions. Our results generalize and/or improve on several previous results for the top-k arm identification problem and the combinatorial pure exploration problem when the combinatorial constraint is a matroid.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pure Exploration in Episodic Fixed-Horizon Markov Decision Processes

Multi-Armed Bandit (MAB) problems can be naturally extended to Markov Decision Processes (MDP). We extend the Best Arm Identification problem to episodic fixed-horizon MDPs. Here, the goal of an agent interacting with the MDP is to reach a high confidence on the optimal policy in as few episodes as possible. We propose Posterior Sampling for Pure Exploration (PSPE), a Bayesian algorithm for pur...

متن کامل

On Interruptible Pure Exploration in Multi-Armed Bandits

Interruptible pure exploration in multi-armed bandits (MABs) is a key component of Monte-Carlo tree search algorithms for sequential decision problems. We introduce Discriminative Bucketing (DB), a novel family of strategies for pure exploration in MABs, which allows for adapting recent advances in non-interruptible strategies to the interruptible setting, while guaranteeing exponential-rate pe...

متن کامل

Cognitive Capacity and Choice under Uncertainty: Human Experiments of Two-armed Bandit Problems

The two-armed bandit problem, or more generally, the multi-armed bandit problem, has been identified as the underlying problem of many practical circumstances which involves making a series of choices among uncertain alternatives. Problems like job searching, customer switching, and even the adoption of fundamental or technical trading strategies of traders in financial markets can be formulate...

متن کامل

Improved Learning Complexity in Combinatorial Pure Exploration Bandits

We study the problem of combinatorial pure exploration in the stochastic multi-armed bandit problem. We first construct a new measure of complexity that provably characterizes the learning performance of the algorithms we propose for the fixed confidence and the fixed budget setting. We show that this complexity is never higher than the one in existing work and illustrate a number of configurat...

متن کامل

Generic Exploration and K-armed Voting Bandits

We study a stochastic online learning scheme with partial feedback where the utility of decisions is only observable through an estimation of the environment parameters. We propose a generic pure-exploration algorithm, able to cope with various utility functions from multi-armed bandits settings to dueling bandits. The primary application of this setting is to offer a natural generalization of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016