Partially Observed, Multi-objective Markov Games
نویسندگان
چکیده
The intent of this research is to generate a set of non-dominated policies from which one of two agents (the leader) can select a most preferred policy to control a dynamic system that is also affected by the control decisions of the other agent (the follower). The problem is described by an infinite horizon, partially observed Markov game (POMG). The actions of the agents are selected simultaneously at each decision epoch. At each decision epoch, each agent knows: its past and present states, its past actions, and noise corrupted observations of the other agent’s past and present states. The actions of each agent are determined by the agent’s policy, which selects actions at each decision epoch based on these data. The leader considers multiple objectives in selecting its policy. The follower considers a single objective in selecting its policy with complete knowledge of and in response to the policy selected by the leader. This leader-follower assumption allows the POMG to be transformed into a specially structured, partially observed Markov decision process (POMDP). This POMDP is used to determine the follower’s best response policy. A multi-objective genetic algorithm (MOGA) is used to create the next generation of leader policies based on the fitness measures of each leader policy in the current generation. Computing a fitness measure for a leader policy requires a value determination calculation, given the leader policy and the follower’s best response policy. The policies from which the leader can select a most preferred policy are the non-dominated policies of the final generation of leader policies created
منابع مشابه
Markov Games of Incomplete Information for Multi-Agent Reinforcement Learning
Partially observable stochastic games (POSGs) are an attractive model for many multi-agent domains, but are computationally extremely difficult to solve. We present a new model, Markov games of incomplete information (MGII) which imposes a mild restriction on POSGs while overcoming their primary computational bottleneck. Finally we show how to convert a MGII into a continuous but bounded fully ...
متن کاملA Sampling-Based Approach to Computing Equilibria in Succinct Extensive-Form Games
A central task of artificial intelligence is the design of artificial agents that act towards specified goals in partially observed environments. Since such environments frequently include interaction over time with other agents with their own goals, reasoning about such interaction relies on sequential game-theoretic models such as extensive-form games or some of their succinct representations...
متن کاملA Value Iteration Algorithm for Partially Observed Markov Decision Process Multi-armed Bandits
A value iteration based algorithm is given for computing the Gittins index of a Partially Observed Markov Decision Process (POMDP) Multi-armed Bandit problem. This problem concerns dynamical allocation of efforts between a number of competing projects of which only one can be worked on at any time period. The active project evolves according to a finite state Markov chain and generates then a r...
متن کاملBayesian Model of Behaviour in Economic Games
Classical game theoretic approaches that make strong rationality assumptions have difficulty modeling human behaviour in economic games. We investigate the role of finite levels of iterated reasoning and non-selfish utility functions in a Partially Observable Markov Decision Process model that incorporates game theoretic notions of interactivity. Our generative model captures a broad class of c...
متن کاملExploiting Agent and Type Independence in Collaborative Graphical Bayesian Games
Efficient collaborative decision making is an important challenge for multiagent systems. Finding optimal joint actions is especially challenging when each agent has only imperfect information about the state of its environment. Such problems can be modeled as collaborative Bayesian games in which each agent receives private information in the form of its type. However, representing and solving...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1404.4388 شماره
صفحات -
تاریخ انتشار 2013