regret minimization

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

2015

Gergely Neu

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them requires a large deal of technical effort and significant modifications to the standard, more intuitive algorithms that come only with guarantees that hold on ...

متن کامل

Adaptive Power Allocation and Control in Time-Varying Multi-Carrier MIMO Networks

Journal: :CoRR 2015

Ioannis Stiakogiannakis Panayotis Mertikopoulos Corinne Touati

In this paper, we examine the fundamental trade-off between radiated power and achieved throughput in wireless multi-carrier, multiple-input and multiple-output (MIMO) systems that vary with time in an unpredictable fashion (e.g. due to changes in the wireless medium or the users’ QoS requirements). Contrary to the static/stationary channel regime, there is no optimal power allocation profile t...

متن کامل

The adversarial stochastic shortest path problem with unknown transition probabilities

2012

Gergely Neu András György Csaba Szepesvári

We consider online learning in a special class of episodic Markovian decision processes, namely, loop-free stochastic shortest path problems. In this problem, an agent has to traverse through a finite directed acyclic graph with random transitions while maximizing the obtained rewards along the way. We assume that the reward function can change arbitrarily between consecutive episodes, and is e...

متن کامل

Myopic regret avoidance: Feedback avoidance and learning in repeated decision making

2016

Jochen Reb Terry Connolly John Schaubroeck

Decision makers can become trapped by myopic regret avoidance in which rejecting feedback to avoid short-term outcome regret (regret associated with counterfactual outcome comparisons) leads to reduced learning and greater long-term regret over continuing poor decisions. In a series of laboratory experiments involving repeated choices among uncertain monetary prospects, participants primed with...

متن کامل

Learning the distribution with largest mean: two bandit frameworks

Journal: :CoRR 2017

Emilie Kaufmann Aurélien Garivier

Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization. This paper reviews two different sequential learning tasks that have been considered in the bandit literature ; they can be formulated as (sequentially) learning which distribution has the highest mean amon...

متن کامل

Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret

2012

Ofer Dekel Ambuj Tewari Raman Arora

Online learning algorithms are designed to learn even when their input is generated by an adversary. The widely-accepted formal definition of an online algorithm’s ability to learn is the game-theoretic notion of regret. We argue that the standard definition of regret becomes inadequate if the adversary is allowed to adapt to the online algorithm’s actions. We define the alternative notion of p...

متن کامل

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

2007

Shai Shalev-Shwartz Yoram Singer

Many problems arising in machine learning can be cast as a convex optimization problem, in which a sum of a loss term and a regularization term is minimized. For example, in Support Vector Machines the loss term is the average hinge-loss of a vector over a training set of examples and the regularization term is the squared Euclidean norm of this vector. In this paper we study an algorithmic fra...

متن کامل

An Analysis of Chaining in Multi-Label Classification

2012

Krzysztof Dembczynski Willem Waegeman Eyke Hüllermeier

The idea of classifier chains has recently been introduced as a promising technique for multi-label classification. However, despite being intuitively appealing and showing strong performance in empirical studies, still very little is known about the main principles underlying this type of method. In this paper, we provide a detailed probabilistic analysis of classifier chains from a risk minim...

متن کامل

Regret Minimization for Online Buffering Problems Using the Weighted Majority Algorithm

Journal: :Electronic Colloquium on Computational Complexity (ECCC) 2010

Sascha Geulen Berthold Vöcking Melanie Winkler

Suppose a decision maker has to purchase a commodity over time with varying prices and demands. In particular, the price per unit might depend on the amount purchased and this price function might vary from step to step. The decision maker has a buffer of bounded size for storing units of the commodity that can be used to satisfy demands at later points in time. We seek for an algorithm decidin...

متن کامل

Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games

2015

Viliam Lisý Marc Lanctot Michael H. Bowling

Online search in games has been a core interest of artificial intelligence. Search in imperfect information games (e.g., Poker, Bridge, Skat) is particularly challenging due to the complexities introduced by hidden information. In this paper, we present Online Outcome Sampling, an online search variant of Monte Carlo Counterfactual Regret Minimization, which preserves its convergence to Nash eq...

متن کامل