Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits
نویسندگان
چکیده
We consider combinatorial semi-bandits over a set of arms X \subset \0,1\ ^d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields smallest known regret bound R(T) = O( d (łn m)^2 T) / Δ_\min ) after T rounds, m \max_x \in 1^\top x. However, it has computational complexity O(|X|), which is typically exponential in d, and cannot be used large dimensions. propose first that both computationally statistically efficient for problem with asymptotic O(δ_T^-1 poly(d)), δ_T function vanishes arbitrarily slowly. Our approach involves carefully designing AESCB, an approximate version same guarantees. show that, whenever budgeted linear maximization can solved up to given approximation ratio, AESCB implementable polynomial time poly(d)) by repeatedly maximizing subject budget constraint, showing how solve these problems efficiently.
منابع مشابه
Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation
Multi-Armed Bandit (MAB) framework has been successfully applied in many web applications. However, many complex real-world applications that involve multiple content recommendations cannot fit into the traditional MAB setting. To address this issue, we consider an ordered combinatorial semi-bandit problem where the learner recommends S actions from a base set of K actions, and displays the res...
متن کاملEfficient Learning in Large-Scale Combinatorial Semi-Bandits
= ̃ O ⇣ K p dnmin {ln(L), d} ⌘ . (11) We now outline the proof of Theorem 3, which is based on (Russo & Van Roy, 2013; Dani et al., 2008). Let H t denote the “history” (i.e. all the available information) by the start of episode t. Note that from the Bayesian perspective, conditioning on H t , ✓⇤ and ✓ t are i.i.d. drawn from N( ̄ ✓ t ,⌃ t ) (see (Russo & Van Roy, 2013)). This is because that con...
متن کاملEfficient Learning in Large-Scale Combinatorial Semi-Bandits
• the agent knows a generalization matrix Φ ∈ <L×d s.t. w̄ = EP [wt] is “close” to span[Φ] • such models are available in many cases Performance Metrics At each time t, choosing At ∈ A can be challenging, since the combinatorial optimization problem maxA∈A ∑ e∈A w(e) can be NP-hard. We assume the agent uses a combinatorial optimization algorithm ORACLE to choose At, where ORACLE can be an approx...
متن کاملThompson Sampling for Combinatorial Semi-Bandits
We study the application of the Thompson Sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distributiondependent regret bound of O(m log T/∆min) for TS under general CMAB, where m is the number of arms, T is the time horizon, and ∆min is the minimum gap between the expect...
متن کاملBypassing Combinatorial Protections: Polynomial-Time Algorithms for Single-Peaked Electorates
For many election systems, bribery (and related) attacks have been shown NP-hard using constructions on combinatorially rich structures such as partitions and covers. This paper shows that for voters who follow the most central political-science model of electorates— single-peaked preferences—those hardness protections vanish. By using single-peaked preferences to simplify combinatorial coverin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ACM on measurement and analysis of computing systems
سال: 2021
ISSN: ['2476-1249']
DOI: https://doi.org/10.1145/3447387