No Regret and the Minimax Theorem of Game
نویسنده
چکیده
i=1 pi log 1 pi . To do this, we must bound the two terms on the right hand side of the bound above. Step 1: Bounding the Range of the Regularizer We begin by deriving upper and lower bounds on the entropy function H(~ p). The lower bound is easy. Since for all i, 0 ≤ pi ≤ 1 , pi log 1 p i ≥ 0. (Remember that we define 0 log(1/0) to be 0 by convention.) As we discussed before, H(~ p) = 0 is achieved when ~ p puts all of its weight on a single expert. To upper bound H(~ p), we can use Jensen’s inequality. We get H(~ p) = n
منابع مشابه
Robustness in portfolio optimization based on minimax regret approach
Portfolio optimization is one of the most important issues for effective and economic investment. There is plenty of research in the literature addressing this issue. Most of these pieces of research attempt to make the Markowitz’s primary portfolio selection model more realistic or seek to solve the model for obtaining fairly optimum portfolios. An efficient frontier in the ...
متن کاملCS 364 A : Algorithmic Game Theory Lecture # 18 : From External Regret to Swap Regret and the Minimax Theorem ∗
Last lecture we proved that coarse correlated equilibria (CCE) are tractable, in a satisfying sense: there are simple and computationally efficient learning procedures that converge quickly to the set of CCE. Of course, if anything in our equilibrium hierarchy (Figure 1) was going to be tractable, it was going to be CCE, the biggest set. The good researcher is never satisfied and always seeks s...
متن کاملOnline Learning with Variable Stage Duration
We consider online learning in repeated decision problems, within the framework of a repeated game against an arbitrary opponent. For repeated matrix games, well known results establish the existence of no-regret strategies; such strategies secure a long-term average payoff that comes close to the maximal payoff that could be obtained, in hindsight, by playing any fixed action against the obser...
متن کامل4 Learning , Regret minimization , and Equilibria
Many situations involve repeatedly making decisions in an uncertain environment: for instance, deciding what route to drive to work each day, or repeated play of a game against an opponent with an unknown strategy. In this chapter we describe learning algorithms with strong guarantees for settings of this type, along with connections to game-theoretic equilibria when all players in a system are...
متن کاملAbstracts - Workshop on Algorithmic Challenges in Machine Learning
The difficulty of an online learning problem is typically measured by its minimax regret. If the minimax regret grows sublinearly with the number of online rounds (denoted by T), we say that the problem is learnable. Until recently, we recognized only two classes of online learning problems: problems whose minimax regret grows at a slow rate of O(\sqrt(T)), and unlearnable problems with linear ...
متن کامل