Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe
نویسندگان
چکیده
We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback. In this problem, the objective is to minimize a global loss function of all the actions, not necessarily a cumulative loss. This framework allows us to study a very general class of problems, with applications in statistics, machine learning, and other fields. To solve this problem, we analyze the Upper-Confidence Frank-Wolfe algorithm, inspired by techniques for bandits and convex optimization. We give theoretical guarantees for the performance of this algorithm over various classes of functions, and discuss the optimality of these results.
منابع مشابه
Bandit Optimization with Upper-Confidence Frank-Wolfe
We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback. In this problem, the objective is to minimize a global loss function of all the actions, not necessarily a cumulative loss. This framework allows us to study a very general class of problems, with applications in statistics, machine learning, and other fields. T...
متن کاملFast Algorithm for Logistic Bandit
We study a logistic bandit problem and propose an algorithm that enjoys fast update. In our problem, each round the learner first chooses an arm from a decision set, in which each arm is associated with a feature vector. Then, she receives a reward, which is binary and is generated by a logistic function. Our algorithm for the problem can be seen as a marriage between stochastic gradient descen...
متن کاملA stochastic bandit algorithm for scratch games
Stochastic multi-armed bandit algorithms are used to solve the exploration and exploitation dilemma in sequential optimization problems. The algorithms based on upper confidence bounds offer strong theoretical guarantees, they are easy to implement and efficient in practice. We considers a new bandit setting, called “scratch-games”, where arm budgets are limited and reward are drawn without rep...
متن کاملFast Stochastic Frank-Wolfe Algorithms for Nonlinear SVMs
The high computational cost of nonlinear support vector machines has limited their usability for large-scale problems. We propose two novel stochastic algorithms to tackle this problem. These algorithms are based on a simple and classic optimization method: the Frank-Wolfe method, which is known to be fast for problems with a large number of linear constraints. Formulating the nonlinear SVM pro...
متن کاملX Bandits with concave rewards and convex knapsacks
In this paper, we consider a very general model for exploration-exploitation tradeoff which allows arbitrary concave rewards and convex constraints on the decisions across time, in addition to the customary limitation on the time horizon. This model subsumes the classic multi-armed bandit (MAB) model, and the Bandits with Knapsacks (BwK) model of Badanidiyuru et al. [2013]. We also consider an ...
متن کامل