No Regret and the Minimax Theorem of Game

نویسنده

  • Jennifer Wortman Vaughan
چکیده

i=1 pi log 1 pi . To do this, we must bound the two terms on the right hand side of the bound above. Step 1: Bounding the Range of the Regularizer We begin by deriving upper and lower bounds on the entropy function H(~ p). The lower bound is easy. Since for all i, 0 ≤ pi ≤ 1 , pi log 1 p i ≥ 0. (Remember that we define 0 log(1/0) to be 0 by convention.) As we discussed before, H(~ p) = 0 is achieved when ~ p puts all of its weight on a single expert. To upper bound H(~ p), we can use Jensen’s inequality. We get H(~ p) = n

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robustness in portfolio optimization based on minimax regret approach

Portfolio optimization is one of the most important issues for effective and economic investment. There is plenty of research in the literature addressing this issue. Most of these pieces of research attempt to make the Markowitz’s primary portfolio selection model more realistic or seek to solve the model for obtaining fairly optimum portfolios. An efficient frontier in the ...

متن کامل

CS 364 A : Algorithmic Game Theory Lecture # 18 : From External Regret to Swap Regret and the Minimax Theorem ∗

Last lecture we proved that coarse correlated equilibria (CCE) are tractable, in a satisfying sense: there are simple and computationally efficient learning procedures that converge quickly to the set of CCE. Of course, if anything in our equilibrium hierarchy (Figure 1) was going to be tractable, it was going to be CCE, the biggest set. The good researcher is never satisfied and always seeks s...

متن کامل

Online Learning with Variable Stage Duration

We consider online learning in repeated decision problems, within the framework of a repeated game against an arbitrary opponent. For repeated matrix games, well known results establish the existence of no-regret strategies; such strategies secure a long-term average payoff that comes close to the maximal payoff that could be obtained, in hindsight, by playing any fixed action against the obser...

متن کامل

4 Learning , Regret minimization , and Equilibria

Many situations involve repeatedly making decisions in an uncertain environment: for instance, deciding what route to drive to work each day, or repeated play of a game against an opponent with an unknown strategy. In this chapter we describe learning algorithms with strong guarantees for settings of this type, along with connections to game-theoretic equilibria when all players in a system are...

متن کامل

Abstracts - Workshop on Algorithmic Challenges in Machine Learning

The difficulty of an online learning problem is typically measured by its minimax regret. If the minimax regret grows sublinearly with the number of online rounds (denoted by T), we say that the problem is learnable. Until recently, we recognized only two classes of online learning problems: problems whose minimax regret grows at a slow rate of O(\sqrt(T)), and unlearnable problems with linear ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011