Linear Contextual Bandits with Global Constraints and Objective
نویسندگان
چکیده
We consider the linear contextual bandit problem with global convex constraints and a concaveobjective function. In each round, the outcome of pulling an arm is a vector, that depends linearly onthe context of that arm. The global constraints require the average of these vectors to lie in a certainconvex set. The objective is a concave function of this average vector. This problem turns out to bea common generalization of classic linear contextual bandits (linContextual) [8, 17, 1], bandits withconcave rewards and convex knapsacks (BwCR) [4], and the online stochastic convex programming(OSCP) problem [5]. We present algorithms with near-optimal regret bounds for this problem. Ourbounds compare favorably to results on the unstructured version of the problem [6, 12] where therelation between the contexts and the outcomes could be arbitrary, but the algorithm only competesagainst a fixed set of policies. We combine techniques from the work on linContextual, BwCR andOSCP in a nontrivial manner while also tackling new difficulties that are not present in any of thesespecial cases. Microsoft Research. [email protected] Research. [email protected].
منابع مشابه
Linear Contextual Bandits with Knapsacks
We consider the linear contextual bandit problem with resource consumption, in addition to reward generation. In each round, the outcome of pulling an arm is a reward as well as a vector of resource consumptions. The expected values of these outcomes depend linearly on the context of that arm. The budget/capacity constraints require that the total consumption doesn’t exceed the budget for each ...
متن کاملResourceful Contextual Bandits
We study contextual bandits with ancillary constraints on resources, which are common in realworld applications such as choosing ads or dynamic pricing of items. We design the first algorithm for solving these problems that improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits (arbitrary policy sets, Dudik et al. (2011)) and ...
متن کاملContextual Bandits with Global Constraints and Objective
We consider the contextual version of a multi-armed bandit problem with global convex constraints and concave objective function. In each round, the outcome of pulling an arm is a context-dependent vector, and the global constraints require the average of these vectors to lie in a certain convex set. The objective is a concave function of this average vector. The learning agent competes with an...
متن کاملA Survey on Contextual Multi-armed Bandits
4 Stochastic Contextual Bandits 6 4.1 Stochastic Contextual Bandits with Linear Realizability Assumption . . . . 6 4.1.1 LinUCB/SupLinUCB . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1.2 LinREL/SupLinREL . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1.3 CofineUCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1.4 Thompson Sampling with Linear Payoffs...
متن کاملProvably Optimal Algorithms for Generalized Linear Contextual Bandits
Contextual bandits are widely used in Internet services from news recommendation to advertising, and to Web search. Generalized linear models (logistical regression in particular) have demonstrated stronger performance than linear models in many applications where rewards are binary. However, most theoretical analyses on contextual bandits so far are on linear bandits. In this work, we propose ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1507.06738 شماره
صفحات -
تاریخ انتشار 2015