Sequential Monte Carlo Bandits

نویسندگان

Michael Cherkassky

Luke Bornn

چکیده

In this paper we propose a flexible and efficient framework for handling multi-armed bandits, combining sequential Monte Carlo algorithms with hierarchical Bayesian modeling techniques. The framework naturally encompasses restless bandits, contextual bandits, and other bandit variants under a single inferential model. Despite the model’s generality, we propose efficient Monte Carlo algorithms to make inference scalable, based on recent developments in sequential Monte Carlo methods. Through two simulation studies, the framework is shown to outperform other empirical methods, while also naturally scaling to more complex problems for which existing approaches can not cope. Additionally, we successfully apply our framework to online video-based advertising recommendation, and show its increased efficacy as compared to current state of the art bandit algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Interruptible Pure Exploration in Multi-Armed Bandits

Interruptible pure exploration in multi-armed bandits (MABs) is a key component of Monte-Carlo tree search algorithms for sequential decision problems. We introduce Discriminative Bucketing (DB), a novel family of strategies for pure exploration in MABs, which allows for adapting recent advances in non-interruptible strategies to the interruptible setting, while guaranteeing exponential-rate pe...

متن کامل

Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats

This paper was motivated by the problem of developing an optimal strategy for exploring a large oil and gas field in the North Sea. Where should we drill first? Where do we drill next? The problem resembles a classical multiarmed bandit problem, but probabilistic dependence plays a key role: outcomes at drilled sites reveal information about neighboring targets. Good exploration strategies will...

متن کامل

Evaluating Quasi-Monte Carlo (QMC) algorithms in blocks decomposition of de-trended

The length of equal minimal and maximal blocks has eected on logarithm-scale logarithm against sequential function on variance and bias of de-trended uctuation analysis, by using Quasi Monte Carlo(QMC) simulation and Cholesky decompositions, minimal block couple and maximal are founded which are minimum the summation of mean error square in Horest power.

متن کامل

Predictive Adaptation of Hybrid Monte Carlo with Bayesian Parametric Bandits

This paper introduces a novel way of adapting the Hybrid Monte Carlo (HMC) algorithm using parametric bandits with nonlinear features. HMC is a powerful Markov chain Monte Carlo (MCMC) method, but it requires careful tuning of its hyper-parameters. We propose a Bayesian parametric bandit approach to carry out the adaptation of the hyper-parameters while the Markov chain progresses. We also intr...

متن کامل

Upper Confidence Trees and Billiards for Optimal Active Learning

This paper focuses on Active Learning (AL) with bounded computational resources. AL is formalized as a finite horizon Reinforcement Learning problem, and tackled as a single-player game. An approximate optimal AL strategy based on tree-structured multi-armed bandit algorithms and billiard-based sampling is presented together with a proof of principle of the approach. Motsclés : Apprentissage ac...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1310.1404 شماره

صفحات -

تاریخ انتشار 2013

Sequential Monte Carlo Bandits

نویسندگان

چکیده

منابع مشابه

On Interruptible Pure Exploration in Multi-Armed Bandits

Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats

Evaluating Quasi-Monte Carlo (QMC) algorithms in blocks decomposition of de-trended

Predictive Adaptation of Hybrid Monte Carlo with Bayesian Parametric Bandits

Upper Confidence Trees and Billiards for Optimal Active Learning

عنوان ژورنال:

اشتراک گذاری