Reward Design for Multi-Agent Reinforcement Learning with a Penalty Based on the Payment Mechanism
نویسندگان
چکیده
In this paper, we propose a novel method of reward design for multi-agent reinforcement learning (MARL). One the main uses MARL is building cooperative policies between self-interested agents. We take inspiration from concept mechanism game theory to modify how agents are rewarded in algorithms. defined payment that reflects negative contribution other agents’ valuation same manner as Vickrey-Clarke-Groves (VCG) mechanism. give individual agent signal consists two elements. evaluated solely on basis behavior will follow greedy and selfish policy, penalty reflect social welfare. call scheme based (RDPM). experimented with RDPM different scenarios. show can increase utility among while designs achieve far less, even basic simplistic problems. finally analyze discuss affects policy.
منابع مشابه
Plan-based reward shaping for multi-agent reinforcement learning
Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learnin...
متن کاملAbstract MDP Reward Shaping for Multi-Agent Reinforcement Learning
MDP Reward Shaping for Multi-Agent Reinforcement Learning Kyriakos Efthymiadis, Sam Devlin and Daniel Kudenko Department of Computer Science, The University of York, UK Abstract. Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. As attention is shifting from tabula-rasa approaches to methods where some heuristic domain knowledge can be give...
متن کاملDecentralized multi-agent reinforcement learning in average-reward dynamic DCOPs
Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in oth...
متن کاملPotential-based reward shaping for knowledge-based, multi-agent reinforcement learning
Reinforcement learning is a robust artificial intelligence solution for agents required to act in an environment, making their own decisions on how to behave. Typically an agent is deployed alone with no prior knowledge, but if given sufficient time, a suitable state representation and an informative reward function is guaranteed to learn how to maximise its long term reward. Incorporating doma...
متن کاملAutonomous Learning of Reward Distribution for Each Agent in Multi-Agent Reinforcement Learning
A novel approach for the reward distribution in multi-agent reinforcement learning is proposed. The agent who gets a reward gives a part of it to the other agents. If an agent gives a part of its own reward to the other ones, they may help the agent to get more reward. There may be some cases in which the agent gets more reward than that it gave to the other ones. In this case, it is better for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of The Japanese Society for Artificial Intelligence
سال: 2021
ISSN: ['1346-0714', '1346-8030']
DOI: https://doi.org/10.1527/tjsai.36-5_ag21-h