Multi-Agent Planning with Baseline Regret Minimization
نویسندگان
چکیده
We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge to the policy with minimum baseline regret. Experimental results on common benchmark problems confirm the benefits of the algorithm compared with the state-of-the-art approaches.
منابع مشابه
Inference-based Decision Making in Games
Background: Reinforcement learning in complex games has traditionally been the domain of valueor policy iteration algorithms, resulting from their effectiveness in planning in Markov decision processes, before algorithms based on regret minimization guarantees such as upper confidence bounds applied to trees (UCT) and counterfactual regret minimization were developed and proved to be very succe...
متن کاملHedging Under Uncertainty: Regret Minimization Meets Exponentially Fast Convergence
This paper examines the problem of multi-agent learning in N -person non-cooperative games. For concreteness, we focus on the socalled “hedge” variant of the exponential weights (EW) algorithm, one of the most widely studied algorithmic schemes for regret minimization in online learning. In this multi-agent context, we show that a) dominated strategies become extinct (a.s.); and b) in generic g...
متن کاملA Regret Minimization Approach in Product Portfolio Management with respect to Customers’ Price-sensitivity
In an uncertain and competitive environment, product portfolio management (PPM) becomes more challenging for manufacturers to decide what to make and establish the most beneficial product portfolio. In this paper, a novel approach in PPM is proposed in which the environment uncertainty, competitors’ behavior and customer’s satisfaction are simultaneously considered as the most important criteri...
متن کاملMulti-agent Learning and the Reinforcement Gradient
The number of proposed reinforcement learning algorithms appears to be ever-growing. This article tackles the diversification by showing a persistent principle in several independent reinforcement learning algorithms that have been applied to multi-agent settings. While their learning structure may look very diverse, algorithms such as Gradient Ascent, Cross learning, variations of Q-learning a...
متن کاملMulti-Agent Counterfactual Regret Minimization for Partial-Information Collaborative Games
We study the generalization of counterfactual regret minimization (CFR) to partialinformation collaborative games with more than 2 players. For instance, many 4-player card games are structured as 2v2 games, with each player only knowing the contents of their own hand. To study this setting, we propose a multi-agent collaborative version of Kuhn Poker. We observe that a straightforward applicat...
متن کامل