نتایج جستجو برای: regret minimization
تعداد نتایج: 37822 فیلتر نتایج به سال:
Extensive games are a powerful model of multiagent decision-making scenarioswith incomplete information. Finding a Nash equilibrium for very large instancesof these games has received a great deal of recent attention. In this paper, wedescribe a new technique for solving large games based on regret minimization.In particular, we introduce the notion of counterfactual regret, whi...
We relate the strategy sets that a player ends up with after refining his own strategies according to two very different models of rationality: namely, utility maximization and regret minimization. ar X iv :1 40 3. 63 94 v1 [ cs .G T ] 2 5 M ar 2 01 4
The Thompson Sampling (TS) policy is a widely implemented algorithm for the stochastic multiarmed bandit (MAB) problem. Given a prior distribution over possible parameter settings of the underlying reward distributions of the arms, at each time instant, the policy plays an arm with probability equal to the probability that this arm has largest mean reward conditioned on the current posterior di...
In this work, we extend the applicability of regret minimization to pricing financial instruments, following the work of [10]. More specifically, we consider pricing a type of exotic option called a fixed-strike lookback call option. A fixed-strike lookback call option has a known expiration time, at which the option holder has the right to receive the difference between the maximal price of a ...
In 2000, Hart and Mas-Colell introduced the important game-theoretic algorithm of regret matching. Players reach equilibrium play by tracking regrets for past plays, making future plays proportional to positive regrets. The technique is not only simple and intuitive; it has sparked a revolution in computer game play of some of the most difficult bluffing games, including clear domination of ann...
We propose a computationally efficient random walk on a convex body which rapidly mixes to a time-varying Gibbs distribution. In the setting of online convex optimization and repeated games, the algorithm yields low regret and presents a novel efficient method for implementing mixture forecasting strategies.
1 Regret Minimization In this lecture, our goal is to build a strategy with good performance when dealing with repeated games. Let us start with a simple model of regret. In this model a player performs a partial optimization on his actions. Following each action he updates his belief and selects the next actions, dependent on the outcome. We will also show that for a familty of games, socially...
UCT, a state-of-the art algorithm for Monte Carlo tree sampling (MCTS), is based on UCB, a sampling policy for the Multi-armed Bandit Problem (MAB) that minimizes the accumulated regret. However, MCTS differs from MAB in that only the final choice, rather than all arm pulls, brings a reward, that is, the simple regret, as opposite to the cumulative regret, must be minimized. This ongoing work a...
The game-theoretic solution concept Iterated Regret Minimization (IRM) was introduced recently by Halpern and Pass. We give the first application of IRM to simultaneous voting games. We study positional scoring rules in detail and give theoretical results demonstrating the bias of IRM toward sincere voting. We present comprehensive simulation results of the effect on social welfare of IRM compa...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید