Fairness in Reinforcement Learning
نویسندگان
چکیده
We initiate the study of fairness in reinforcement learning, where the actions of a learning algorithm may affect its environment and future rewards. Our fairness constraint requires that an algorithm never prefers one action over another if the long-term (discounted) reward of choosing the latter action is higher. Our first result is negative: despite the fact that fairness is consistent with the optimal policy, any learning algorithm satisfying fairness must take time exponential in the number of states to achieve non-trivial approximation to the optimal policy. We then provide a provably fair polynomial time algorithm under an approximate notion of fairness, thus establishing an exponential gap between exact and approximate fairness.
منابع مشابه
Reinforcement Learning for Fair Dynamic Pricing
Unfair pricing policies have been shown to be one of the most negative perceptions customers can have concerning pricing, and may result in long-term losses for a company. Despite the fact that dynamic pricing models help companies maximize revenue, fairness and equality should be taken into account in order to avoid unfair price differences between groups of customers. This paper shows how to ...
متن کاملFair Learning in Markovian Environments
We initiate the study of fairness in reinforcement learning, where the actions of a learning algorithm may affect its environment and future rewards. Our fairness constraint requires that an algorithm never prefers one action over another if the long-term (discounted) reward of choosing the latter action is higher. Our first result is negative: despite the fact that fairness is consistent with ...
متن کاملDynamic Resource Allocation through Reinforcement Learning Approach in Multi-cell OFDMA Networks
In this paper, we present a distributed resource allocation algorithm for cellular OFDMA networks by adopting a Reinforcement Learning (RL) approach. We use an RL method which employ Growing Self Organizing Maps to deal with the huge and continuous problem space. The goal of the algorithm is to maximize the network throughput in a fair manner. Indeed, the algorithm maximizes the throughput unti...
متن کاملCycle Time Optimization of Processes Using an Entropy-Based Learning for Task Allocation
Cycle time optimization could be one of the great challenges in business process management. Although there is much research on this subject, task similarities have been paid little attention. In this paper, a new approach is proposed to optimize cycle time by minimizing entropy of work lists in resource allocation while keeping workloads balanced. The idea of the entropy of work lists comes fr...
متن کاملDowealth Differences Affect Fairness Considerations?
The influence of relative wealth on fairness considerations is analyzed in a series of ultimatum game experiments in which proposers and receivers are given large and widely unequal initial endowments. Subjects initially demonstrate a concern for fairness. With time however, the dynamics of behavior become at odds with both subgame perfection and fairness. Evidence of learning is detected for b...
متن کامل