Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits

نویسندگان

چکیده

Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems. Most research for this problem focuses exclusively on the settings that players have full access all arms and receive no reward when pulling same arm. Hence solve with goal of maximizing their cumulative reward. However, these neglect several important factors in many real-world applications, where limited a dynamic local subset (i.e., arm could sometimes be ``walking'' not accessible player). To end, paper proposes multi-player walking bandits model, aiming address aforementioned modeling issues. The now maximize reward, however, can only pull from collect if other We adopt Upper Confidence Bound (UCB) deal exploration-exploitation tradeoff employ distributed optimization techniques properly handle collisions. By carefully integrating two techniques, we propose decentralized algorithm near-optimal guarantee regret, easily implemented obtain competitive empirical performance.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits

In this paper, we introduce a multi-agent multi-armed bandit-based model for ad hoc teamwork with expensive communication. The goal of the team is to maximize the total reward gained from pulling arms of a bandit over a number of epochs. In each epoch, each agent decides whether to pull an arm and hence collect a reward, or to broadcast the reward it obtained in the previous epoch to the team a...

متن کامل

Stochastic Multi-armed Bandits in Constant Space

We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all K arms. We give an algorithm using O(1) words of space with regret

متن کامل

Bounded regret in stochastic multi-armed bandits

We study the stochastic multi-armed bandit problem when one knows the value μ(⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap ∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows ∆, and bound...

متن کامل

Robust Risk-Averse Stochastic Multi-armed Bandits

We study a variant of the standard stochastic multi-armed bandit problem when one is not interested in the arm with the best mean, but instead in the arm maximising some coherent risk measure criterion. Further, we are studying the deviations of the regret instead of the less informative expected regret. We provide an algorithm, called RA-UCB to solve this problem, together with a high probabil...

متن کامل

Contextual Multi-Armed Bandits

We study contextual multi-armed bandit problems where the context comes from a metric space and the payoff satisfies a Lipschitz condition with respect to the metric. Abstractly, a contextual multi-armed bandit problem models a situation where, in a sequence of independent trials, an online algorithm chooses, based on a given context (side information), an action from a set of possible actions ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i9.26251