Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback
نویسندگان
چکیده
In this paper, we first study the problem of combinatorial pure exploration with full-bandit feedback (CPE-BL), where a learner is given action space X \subseteq {0,1}^d, and in each round pulls an x \in receives random reward expectation x^T \theta, \theta \R^d latent unknown environment vector. The objective to identify optimal highest expected reward, using as few samples possible. For CPE-BL, design polynomial-time adaptive algorithm, whose sample complexity matches lower bound (within logarithmic factor) for family instances has light dependence \Delta_min (the smallest gap between sub-optimal actions). Furthermore, propose novel generalization CPE-BL flexible structures, called partial linear (CPE-PL), which encompasses several families sub-problems including feedback, semi-bandit nonlinear functions. CPE-PL, pull reports vector M_x , R^{m_x \times d} transformation matrix x, gains (possibly nonlinear) related x. develop simultaneously addresses limited general function (e.g., matroids, matchings s-t paths), provide its analysis. Our empirical evaluation demonstrates that our algorithms run orders magnitude faster than existing ones, algorithm robust across different settings while CPE-PL one returning correct answers
منابع مشابه
Pure Exploration for Multi-Armed Bandit Problems
We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of forecasters that perform an on-line exploration of the arms. These forecasters are assessed in terms of their simple regret, a regret notion that captures the fact that exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast...
متن کاملCombinatorial Partial Monitoring Game with Linear Feedback and Its Applications
In online learning, a player chooses actions to play and receives reward and feedback from the environment with the goal of maximizing her reward over time. In this paper, we propose the model of combinatorial partial monitoring games with linear feedback, a model which simultaneously addresses limited feedback, infinite outcome space of the environment and exponentially large action space of t...
متن کاملPure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence
We consider the problem of near-optimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution. We (1) introduce a PAC-like framework within which to derive and cast results; (2) derive a sample complexity lower bound for near-optimal arm identification; (3) propose an algorithm that identifies a nearl...
متن کاملCombinatorial Pure Exploration of Multi-Armed Bandits
We study the combinatorial pure exploration (CPE) problem in the stochastic multi-armed bandit setting, where a learner explores a set of arms with the objective of identifying the optimal member of a decision class, which is a collection of subsets of arms with certain combinatorial structures such as size-K subsets, matchings, spanning trees or paths, etc. The CPE problem represents a rich cl...
متن کاملPure Exploration of Multi-armed Bandit Under Matroid Constraints
We study the pure exploration problem subject to a matroid constraint (Best-Basis) in a stochastic multi-armed bandit game. In a Best-Basis instance, we are given n stochastic arms with unknown reward distributions, as well as a matroid M over the arms. Let the weight of an arm be the mean of its reward distribution. Our goal is to identify a basis of M with the maximum total weight, using as f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i8.16892