Learning Pareto-optimal Solutions in 2x2 Conflict Games

نویسندگان

  • Stéphane Airiau
  • Sandip Sen
چکیده

Multiagent learning literature has investigated iterated two-player games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. Such equilibrium configurations imply that no player has the motivation to unilaterally change its strategy. Often, in general sum games, a higher payoff can be obtained by both players if one chooses not to respond myopically to the other player. By developing mutual trust, agents can avoid immediate best responses that will lead to a Nash Equilibrium with lesser payoff. In this paper we experiment with agents who select actions based on expected utility calculations that incorporate the observed frequencies of the actions of the opponent(s). We augment these stochastically greedy agents with an interesting action revelation strategy that involves strategic declaration of one’s commitment to an action to avoid worst-case, pessimistic moves. We argue that in certain situations, such apparently risky action revelation can indeed produce better payoffs than a nonrevealing approach. In particular, it is possible to obtain Pareto-optimal Nash Equilibrium outcomes. We improve on the outcome efficiency of a previous algorithm and present results over the set of structurally distinct two-person two-action conflict games where the players’ preferences form a total order over the possible outcomes. We also present results on a large number of randomly generated payoff matrices of varying sizes and compare the payoffs of strategically revealing learners to payoffs at Nash equilibrium.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Decentralized Computation of Pareto Optimal Pure Nash Equilibria of Boolean Games with Privacy Concerns

In Boolean games, agents try to reach a goal formulated as a Boolean formula. These games are attractive because of their compact representations. However, few methods are available to compute the solutions and they are either limited or do not take privacy or communication concerns into account. In this paper we propose the use of an algorithm related to reinforcement learning to address this ...

متن کامل

Pareto Efficiency and Approximate Pareto Efficiency in Routing and Load Balancing Games

We analyze the Pareto efficiency, or inefficiency, of solutions to routing games and load balancing games, focusing on Nash equilibria and greedy solutions to these games. For some settings, we show that the solutions are necessarily Pareto optimal. When this is not the case, we provide a measure to quantify the distance of the solution from Pareto efficiency. Using this measure, we provide upp...

متن کامل

Simple reinforcement learning agents: Pareto beats Nash in an algorithmic game theory study

Repeated play in games by simple adaptive agents is investigated. The agents use Q-learning, a special form of reinforcement learning, to direct learning of behavioral strategies in a number of 2! 2 games. The agents are able effectively to maximize the total wealth extracted. This often leads to Pareto optimal outcomes. When the rewards signals are sufficiently clear, Pareto optimal outcomes w...

متن کامل

Pareto-optimal Solutions for Multi-objective Optimal Control Problems using Hybrid IWO/PSO Algorithm

Heuristic optimization provides a robust and efficient approach for extracting approximate solutions of multi-objective problems because of their capability to evolve a set of non-dominated solutions distributed along the Pareto frontier. The convergence rate and suitable diversity of solutions are of great importance for multi-objective evolutionary algorithms. The focu...

متن کامل

Analysing the Effects of Reward Shaping in Multi-Objective Stochastic Games

The majority of Multi-Agent Reinforcement Learning (MARL) implementations aim to optimise systems with respect to a single objective, despite the fact that many real world problems are inherently multi-objective in nature. Research into multi-objective MARL is still in its infancy, and few studies to date have dealt with the issue of credit assignment. Reward shaping has been proposed as a mean...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005