Lecture 9 : ( Semi - ) bandits and experts with linear costs ( part I )
نویسندگان
چکیده
In this lecture, we will study bandit problems with linear costs. In this setting, actions are represented by vectors in a low-dimensional real space. For simplicity, we will assume that all actions lie within a unit hypercube: a ∈ [0, 1]d. The action costs ct(a) are linear in the vector a, namely: ct(a) = a · vt for some weight vector vt ∈ Rd which is the same for all actions, but depends on the current time step. This problem is useful and challenging under full feedback as well as under bandit feedback; further, we will consider an intermediate regime called semi-bandit feedback. The plan for today is as follows:
منابع مشابه
Lecture 7 : Full feedback and adversarial rewards ( part I )
A real-life example is the investment problem. Each morning, we choose a stock to invest. At the end of the day, we observe not only the price of our target stock but prices of all stocks. Based on this kind of “full“ feedback, we determine which stock to invest for the next day. A motivating special case of “bandits with full feedback” can be framed as a question-answering problem with experts...
متن کاملG : Bandits , Experts and Games 10 / 10 / 16 Lecture 6 : Lipschitz Bandits
Motivation: similarity between arms. In various bandit problems, we may have information on similarity between arms, in the sense that ‘similar’ arms have similar expected rewards. For example, arms can correspond to “items” (e.g., documents) with feature vectors, and similarity can be expressed as some notion of distance between feature vectors. Another example would be the dynamic pricing pro...
متن کاملG : Bandits , Experts and Games 09 / 12 / 16 Lecture 4 : Lower Bounds ( ending ) ; Thompson Sampling
Here is a parameter to be adjusted in the analysis. Recall that K is the number of arms. We considered a “bandits with predictions” problem, and proved that it is impossible to make an accurate prediction with high probability if the time horizon is too small, regardless of what bandit algorithm we use to explore and make the prediction. In fact, we proved it for at least a third of problem ins...
متن کاملCSC 2411 - Linear Programming and Combinatorial Optimization ∗ Lecture 9 : Semi - Definite Programming Combinatorial Optimization
This lecture consists of two main parts. In the first one, we revisit Semi-Definite Programming (SDP). We show its equivalence to Vector Programming, we prove it has efficient membership and separation oracles and finally state a theorem that shows why Ellipsoid can be used to acquire an approximate solution of a semi-definite program. In the second part, we make a first approach to Combinatori...
متن کاملLecture 9 : Linear Bandits ( Part II )
There exist an elliptical confidence region for the w, as described in the following theorem Theorem 1. ([2], Theorem 2) Assuming ‖w‖ ≤ √ d and ‖xt‖ ≤ √ d, with probably 1− δ, we have w ∈ Ct, where Ct = { z : ‖z − ŵt‖Mt ≤ 2 √ d log Td δ } For any x ∈ A, we define UCBx,t = maxz∈Ct z′x if w ∈ Ct (which holds with high probability). At each time, the UCB algorithm then simply picks the bandit with...
متن کامل