Safe Linear Thompson Sampling With Side Information

نویسندگان

چکیده

The design and performance analysis of bandit algorithms in the presence stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider linear stochastic problem under additional unknown that need to be satisfied at each round. For problem, present analyze a new safe algorithm based on Thompson Sampling (TS). Our shows that, with high probability, chooses actions are round achieve cumulative regret order O (d 3/2 log xmlns:xlink="http://www.w3.org/1999/xlink">1/2 d ·T T). Remarkably, matches bound provided by [1], [2] for standard TS absence constraints. Also, our highlights how inherently randomized nature selection rule suffices properly expand set access particular, compare behavior alternative algorithms, which typically require distinct rounds randomization dedicated learning

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linear Thompson Sampling Revisited

We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order e O(d3/2 p T ) as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and h...

متن کامل

Thompson Sampling for Linear-Quadratic Control Problems

We consider the exploration-exploitation tradeoff in linear quadratic (LQ) control problems, where the state dynamics is linear and the cost function is quadratic in states and controls. We analyze the regret of Thompson sampling (TS) (a.k.a. posterior-sampling for reinforcement learning) in the frequentist setting, i.e., when the parameters characterizing the LQ dynamics are fixed. Despite the...

متن کامل

Thompson Sampling for Online Learning with Linear Experts

In this note, we present a version of the Thompson sampling algorithm for the problem of online linear generalization with full information (i.e., the experts setting), studied by Kalai and Vempala, 2005. The algorithm uses a Gaussian prior and time-varying Gaussian likelihoods, and we show that it essentially reduces to Kalai and Vempala’s Follow-thePerturbed-Leader strategy, with exponentiall...

متن کامل

Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the stateof-the-art methods. However, many questions regarding its theoretical performance remained open. In this paper, we d...

متن کامل

Thompson Sampling for Contextual Bandits with Linear Payoffs

The following lemma is implied by Theorem 1 in Abbasi-Yadkori et al. (2011): Lemma 7. (Abbasi-Yadkori et al., 2011) Let (F ′ t; t ≥ 0) be a filtration, (mt; t ≥ 1) be an R-valued stochastic process such that mt is (F ′ t−1)-measurable, (ηt; t ≥ 1) be a real-valued martingale difference process such that ηt is (F ′ t)-measurable. For t ≥ 0, define ξt = ∑t τ=1mτητ and Mt = Id + ∑t τ=1mτm T τ , wh...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Signal Processing

سال: 2021

ISSN: ['1053-587X', '1941-0476']

DOI: https://doi.org/10.1109/tsp.2021.3089822