نتایج جستجو برای: regret minimization

تعداد نتایج: 37822  

2015
Gergely Neu

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them requires a large deal of technical effort and significant modifications to the standard, more intuitive algorithms that come only with guarantees that hold on ...

Journal: :CoRR 2015
Ioannis Stiakogiannakis Panayotis Mertikopoulos Corinne Touati

In this paper, we examine the fundamental trade-off between radiated power and achieved throughput in wireless multi-carrier, multiple-input and multiple-output (MIMO) systems that vary with time in an unpredictable fashion (e.g. due to changes in the wireless medium or the users’ QoS requirements). Contrary to the static/stationary channel regime, there is no optimal power allocation profile t...

2012
Gergely Neu András György Csaba Szepesvári

We consider online learning in a special class of episodic Markovian decision processes, namely, loop-free stochastic shortest path problems. In this problem, an agent has to traverse through a finite directed acyclic graph with random transitions while maximizing the obtained rewards along the way. We assume that the reward function can change arbitrarily between consecutive episodes, and is e...

2016
Jochen Reb Terry Connolly John Schaubroeck

Decision makers can become trapped by myopic regret avoidance in which rejecting feedback to avoid short-term outcome regret (regret associated with counterfactual outcome comparisons) leads to reduced learning and greater long-term regret over continuing poor decisions. In a series of laboratory experiments involving repeated choices among uncertain monetary prospects, participants primed with...

Journal: :CoRR 2017
Emilie Kaufmann Aurélien Garivier

Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization. This paper reviews two different sequential learning tasks that have been considered in the bandit literature ; they can be formulated as (sequentially) learning which distribution has the highest mean amon...

2012
Ofer Dekel Ambuj Tewari Raman Arora

Online learning algorithms are designed to learn even when their input is generated by an adversary. The widely-accepted formal definition of an online algorithm’s ability to learn is the game-theoretic notion of regret. We argue that the standard definition of regret becomes inadequate if the adversary is allowed to adapt to the online algorithm’s actions. We define the alternative notion of p...

2007
Shai Shalev-Shwartz Yoram Singer

Many problems arising in machine learning can be cast as a convex optimization problem, in which a sum of a loss term and a regularization term is minimized. For example, in Support Vector Machines the loss term is the average hinge-loss of a vector over a training set of examples and the regularization term is the squared Euclidean norm of this vector. In this paper we study an algorithmic fra...

2012
Krzysztof Dembczynski Willem Waegeman Eyke Hüllermeier

The idea of classifier chains has recently been introduced as a promising technique for multi-label classification. However, despite being intuitively appealing and showing strong performance in empirical studies, still very little is known about the main principles underlying this type of method. In this paper, we provide a detailed probabilistic analysis of classifier chains from a risk minim...

Journal: :Electronic Colloquium on Computational Complexity (ECCC) 2010
Sascha Geulen Berthold Vöcking Melanie Winkler

Suppose a decision maker has to purchase a commodity over time with varying prices and demands. In particular, the price per unit might depend on the amount purchased and this price function might vary from step to step. The decision maker has a buffer of bounded size for storing units of the commodity that can be used to satisfy demands at later points in time. We seek for an algorithm decidin...

2015
Viliam Lisý Marc Lanctot Michael H. Bowling

Online search in games has been a core interest of artificial intelligence. Search in imperfect information games (e.g., Poker, Bridge, Skat) is particularly challenging due to the complexities introduced by hidden information. In this paper, we present Online Outcome Sampling, an online search variant of Monte Carlo Counterfactual Regret Minimization, which preserves its convergence to Nash eq...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید