The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits

نویسندگان

  • John Langford
  • Tong Zhang
چکیده

We present Epoch-Greedy, an algorithm for contextual multi-armed bandits (also known as bandits with side information). Epoch-Greedy has the following properties: 1. No knowledge of a time horizon T is necessary. 2. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. 3. The regret scales asO(T S) or better (sometimes, much better). Here S is the complexity term in a sample complexity bound for standard supervised learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey on Contextual Multi-armed Bandits

4 Stochastic Contextual Bandits 6 4.1 Stochastic Contextual Bandits with Linear Realizability Assumption . . . . 6 4.1.1 LinUCB/SupLinUCB . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1.2 LinREL/SupLinREL . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1.3 CofineUCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1.4 Thompson Sampling with Linear Payoffs...

متن کامل

A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements

We generalise classical multi-armed and restless bandits to allow for the distribution of a (fixed amount of a) divisible resource among the constituent bandits at each decision point. Bandit activation consumes amounts of the available resource which may vary by bandit and state. Any collection of bandits may be activated at any decision epoch provided they do not consume more resource than is...

متن کامل

Exponentiated Gradient LINUCB for Contextual Multi-Armed Bandits

We present Exponentiated Gradient LINUCB, an algorithm for contextual multi-armed bandits. This algorithm uses Exponentiated Gradient to find the optimal exploration of the LINUCB. Within a deliberately designed offline simulation framework we conduct evaluations with real online event log data. The experimental results demonstrate that our algorithm outperforms surveyed algorithms.

متن کامل

Avoiding the Exploration-Exploitation Tradeoff in Contextual Bandits

The contextual bandit literature has traditionally focused on algorithms that address the explorationexploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be sub-optimal in general. However, exploration-free greedy algorithms are desirable in many practical settings where exploration may be prohibitively costly or unethical (e.g., cli...

متن کامل

Mostly Exploration-Free Algorithms for Contextual Bandits

The contextual bandit literature has traditionally focused on algorithms that address the explorationexploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be sub-optimal in general. However, exploration-free greedy algorithms are desirable in practical settings where exploration may be costly or unethical (e.g., clinical trials). Surp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007