A temporal difference method for multi-objective reinforcement learning
نویسندگان
چکیده
منابع مشابه
Multi-Objective Reinforcement Learning
In multi-objective reinforcement learning (MORL) the agent is provided with multiple feedback signals when performing an action. These signals can be independent, complementary or conflicting. Hence, MORL is the process of learning policies that optimize multiple criteria simultaneously. In this abstract, we briefly describe our extensions to single-objective multi-armed bandits and reinforceme...
متن کاملMultigrid Algorithms for Temporal Difference Reinforcement Learning
We introduce a class of Multigrid based temporal difference algorithms for reinforcement learning with linear function approximation. Multigrid methods are commonly used to accelerate convergence of iterative numerical computation algorithms. The proposed Multigrid-enhanced TD(λ) algorithms allows to accelerate the convergence of the basic TD(λ) algorithm while keeping essentially the same per-...
متن کاملWhy Multi-objective Reinforcement Learning?
We argue that multi-objective methods are underrepresented in RL research, and present three scenarios to justify the need for explicitly multi-objective approaches. Key to these scenarios is that although the utility the user derives from a policy — which is what we ultimately aim to optimize — is scalar, it is sometimes impossible, undesirable or infeasible to formulate the problem as single-...
متن کاملMulti-Objective Deep Reinforcement Learning
We propose Deep Optimistic Linear Support Learning (DOL) to solve highdimensional multi-objective decision problems where the relative importances of the objectives are not known a priori. Using features from the high-dimensional inputs, DOL computes the convex coverage set containing all potential optimal solutions of the convex combinations of the objectives. To our knowledge, this is the fir...
متن کاملA Multi-Objective Deep Reinforcement Learning Framework
This paper presents a new multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We propose linear and non-linear methods to develop the MODRL framework that includes both single-policy and multi-policy strategies. The experimental results on a deep sea treasure environment indicate that the proposed approach is able to converge to the optimal Pareto solutions. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Neurocomputing
سال: 2017
ISSN: 0925-2312
DOI: 10.1016/j.neucom.2016.10.100