Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient
نویسندگان
چکیده
This work focuses on dynamic regret of online convex optimization that compares the performance of online learning to a clairvoyant who knows the sequence of loss functions in advance and hence selects the minimizer of the loss function at each step. By assuming that the clairvoyant moves slowly (i.e., the minimizers change slowly), we present several improved variationbased upper bounds of the dynamic regret under the true and noisy gradient feedback, which are optimal in light of the presented lower bounds. The key to our analysis is to explore a regularity metric that measures the temporal changes in the clairvoyant’s minimizers, to which we refer as path variation. Firstly, we present a general lower bound in terms of the path variation, and then show that under full information or gradient feedback we are able to achieve an optimal dynamic regret. Secondly, we present a lower bound with noisy gradient feedback and then show that we can achieve optimal dynamic regrets under a stochastic gradient feedback and two-point bandit feedback. Moreover, for a sequence of smooth loss functions that admit a small variation in the gradients, our dynamic regret under the two-point bandit feedback matches what is achieved with full information. Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48. Copyright 2016 by the author(s).
منابع مشابه
Perishability of Data: Dynamic Pricing under Varying-Coefficient Models
We consider a firm that sells a large number of products to its customers in an online fashion. Each product is described by a high dimensional feature vector, and the market value of a product is assumed to be linear in the values of its features. Parameters of the valuation model are unknown and can change over time. The firm sequentially observes a product’s features and can use the historic...
متن کاملLearning Rotations Learning rotations with little regret
We describe online algorithms for learning a rotation from pairs of unit vectors in R. We show that the expected regret of our online algorithm compared to the best fixed rotation chosen offline over T iterations is O( √ nT ). We also give a lower bound that proves that this expected regret bound is optimal within a constant factor. This resolves an open problem posed in COLT 2008. Our online a...
متن کاملDynamical Models and tracking regret in online convex programming
This paper describes a new online convex optimization method which incorporates a family of candidate dynamical models and establishes novel tracking regret bounds that scale with the comparator’s deviation from the best dynamical model in this family. Previous online optimization methods are designed to have a total accumulated loss comparable to that of the best comparator sequence, and exist...
متن کاملDynamic Pricing in High-dimensions
We study the pricing problem faced by a firm that sells a large number of products, described via a wide range of features, to customers that arrive over time. This is motivated in part by the prevalence of online marketplaces that allow for real-time pricing. We propose a dynamic policy, called Regularized Maximum Likelihood Pricing (RMLP), that obtains asymptotically optimal revenue. Our poli...
متن کاملAccelerated Gradient Methods for Stochastic Optimization and Online Learning
Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., l1-regularizer). Gradient methods, though highly scalable and easy to implement, are known to converge slowly. In this paper, we develop a novel accelerated gradient method for stochastic optimization while still preserving their computational si...
متن کامل