regret minimization

نتایج جستجو برای: regret minimization

تعداد نتایج: 37822 فیلتر نتایج به سال:

Robust Pricing in Discrete Time

2015

Ying Liu Leonard N. Stern

We consider the pricing problem faced by a monopolist who sells a product to a population of consumers over a discrete number of periods. Customers are heterogeneous in both the willingness-to-pay for the product and the arrival time during the selling season. We assume that the seller knows only the support of the customers’ valuations and do not make any other distributional assumptions about...

متن کامل

MS&E 336 Lecture 15: Calibration

2007

Ramesh Johari

Calibration is a concept that tries to formalize a notion of quality for forecasters. For example, suppose a weatherman predicts each day whether the it will rain, or be sunny. Typically forecasters will predict such events in terms of probabilities, i.e., “There is a 30% chance of rain.” Given only the outcome that day, it is impossible to judge the quality of such a forecast. However, if we c...

متن کامل

Sequential Shortest Path Interdiction with Incomplete Information

Journal: :Decision Analysis 2016

Juan Sebastian Borrero Oleg A. Prokopyev Denis Sauré

We study sequential interdiction when the interdictor has incomplete initial information about the network, and the evader has complete knowledge of the network, including its structure and arc costs. In each time period, the interdictor blocks at most k arcs from the network observed up to that period, after which the evader travels along a shortest path between two (fixed) nodes in the interd...

متن کامل

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities

2017

Alexander Rakhlin Karthik Sridharan

We study an equivalence of (i) deterministic pathwise statements appearing in the online learning literature (termed regret bounds), (ii) high-probability tail bounds for the supremum of a collection of martingales (of a specific form arising from uniform laws of large numbers for martingales), and (iii) in-expectation bounds for the supremum. By virtue of the equivalence, we prove exponential ...

متن کامل

Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning

2017

Noam Brown Tuomas Sandholm

Iterative algorithms such as Counterfactual Regret Minimization (CFR) are the most popular way to solve large zero-sum imperfect-information games. In this paper we introduce Best-Response Pruning (BRP), an improvement to iterative algorithms such as CFR that allows poorly-performing actions to be temporarily pruned. We prove that when using CFR in zero-sum games, adding BRP will asymptotically...

متن کامل

Approximating Optimal Dudo Play with Fixed-Strategy Iteration Counterfactual Regret Minimization

2011

Todd W. Neller Steven Hnath

Using the bluffing dice game Dudo as a challenge domain, we abstract information sets using imperfect recall of actions. Even with such abstraction, the standard Counterfactual Regret Minimization (CFR) algorithm proves impractical for Dudo, with the number of recursive visits to the same abstracted information sets increasing exponentially with the depth of the game graph. By holding strategie...

متن کامل

Multi-agent Learning and the Reinforcement Gradient

2011

Michael Kaisers Daan Bloembergen Karl Tuyls

The number of proposed reinforcement learning algorithms appears to be ever-growing. This article tackles the diversification by showing a persistent principle in several independent reinforcement learning algorithms that have been applied to multi-agent settings. While their learning structure may look very diverse, algorithms such as Gradient Ascent, Cross learning, variations of Q-learning a...

متن کامل

The Value Function with Regret Minimization Algorithm for Solving the Nash Equilibrium of Multi-Agent Stochastic Game

Journal: :International Journal of Computational Intelligence Systems 2021

متن کامل

Stochastic Convex Optimization

2009

Shai Shalev-Shwartz Ohad Shamir Nathan Srebro Karthik Sridharan

For supervised classification problems, it is well known that learnability is equivalent to uniform convergence of the empirical risks and thus to learnability by empirical minimization. Inspired by recent regret bounds for online convex optimization, we study stochastic convex optimization, and uncover a surprisingly different situation in the more general setting: although the stochastic conv...

متن کامل

Blackwell Approachability and No-Regret Learning are Equivalent

2011

Jacob D. Abernethy Peter L. Bartlett Elad Hazan

We consider the celebrated Blackwell Approachability Theorem for two-player games with vector payoffs. Blackwell himself previously showed that the theorem implies the existence of a “noregret” algorithm for a simple online learning problem. We show that this relationship is in fact much stronger, that Blackwell’s result is equivalent to, in a very strong sense, the problem of regret minimizati...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید