نتایج جستجو برای: regret minimization

تعداد نتایج: 37822  

2007
Martin Zinkevich Michael Johanson Michael H. Bowling Carmelo Piccione

Extensive games are a powerful model of multiagent decision-making scenarioswith incomplete information. Finding a Nash equilibrium for very large instancesof these games has received a great deal of recent attention. In this paper, wedescribe a new technique for solving large games based on regret minimization.In particular, we introduce the notion of counterfactual regret, whi...

Journal: :CoRR 2013
Alessandro Chiesa Silvio Micali Zeyuan Allen Zhu

We relate the strategy sets that a player ends up with after refining his own strategies according to two very different models of rationality: namely, utility maximization and regret minimization. ar X iv :1 40 3. 63 94 v1 [ cs .G T ] 2 5 M ar 2 01 4

2014
Sudipto Guha Kamesh Munagala

The Thompson Sampling (TS) policy is a widely implemented algorithm for the stochastic multiarmed bandit (MAB) problem. Given a prior distribution over possible parameter settings of the underlying reward distributions of the arms, at each time instant, the policy plays an arm with probability equal to the probability that this arm has largest mean reward conditioned on the current posterior di...

2011
Eyal Gofer Yishay Mansour

In this work, we extend the applicability of regret minimization to pricing financial instruments, following the work of [10]. More specifically, we consider pricing a type of exotic option called a fixed-strike lookback call option. A fixed-strike lookback call option has a known expiration time, at which the option holder has the right to receive the difference between the maximal price of a ...

2013
Todd W. Neller Marc Lanctot

In 2000, Hart and Mas-Colell introduced the important game-theoretic algorithm of regret matching. Players reach equilibrium play by tracking regrets for past plays, making future plays proportional to positive regrets. The technique is not only simple and intuitive; it has sparked a revolution in computer game play of some of the most difficult bluffing games, including clear domination of ann...

2010
Hariharan Narayanan Alexander Rakhlin

We propose a computationally efficient random walk on a convex body which rapidly mixes to a time-varying Gibbs distribution. In the setting of online convex optimization and repeated games, the algorithm yields low regret and presents a novel efficient method for implementing mixture forecasting strategies.

2010
Yishay Mansour Ghila Castelnuovo Ran Roth

1 Regret Minimization In this lecture, our goal is to build a strategy with good performance when dealing with repeated games. Let us start with a simple model of regret. In this model a player performs a partial optimization on his actions. Following each action he updates his belief and selects the next actions, dependent on the outcome. We will also show that for a familty of games, socially...

Journal: :CoRR 2011
David Tolpin Solomon Eyal Shimony

UCT, a state-of-the art algorithm for Monte Carlo tree sampling (MCTS), is based on UCB, a sampling policy for the Multi-armed Bandit Problem (MAB) that minimizes the accumulated regret. However, MCTS differs from MAB in that only the final choice, rather than all arm pulls, brings a reward, that is, the simple regret, as opposite to the cumulative regret, must be minimized. This ongoing work a...

2014
Miranda Emery Mark C. Wilson

The game-theoretic solution concept Iterated Regret Minimization (IRM) was introduced recently by Halpern and Pass. We give the first application of IRM to simultaneous voting games. We study positional scoring rules in detail and give theoretical results demonstrating the bias of IRM toward sincere voting. We present comprehensive simulation results of the effect on social welfare of IRM compa...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید