Reinforcement Learning in 2-players Games
نویسندگان
چکیده
The purpose of reinforcement learning system is to learn an optimal policy in general. However, in 2players games such as the othello game, it is important to acquire a penalty avoiding policy. In this paper, we are focused on formation of penalty avoiding policies based on the Penalty Avoiding Rational Policy Making algorithm [2]. In applying it to large-scale problems, we are confronted with the curse of dimensionality. To overcome it in 2-players games, we introduce several ideas and heuristics. We show that our learning player can always defeat against the well-known othello game program KITTY.
منابع مشابه
Estimating the Experience-Weighted Attractions for the Migration-Emission Game
Players are unlikely to immediately play equilibrium strategies in complicated games or in games in which they do not have much experience playing. In these cases, players will need to learn to play equilibrium strategies. In laboratory experiments, subjects show systematic patterns of learning during a game. In psychological and economic models of learning, players tend to play a strategy more...
متن کاملOn the convergence of reinforcement learning
This paper examines the convergence of payoffs and strategies in Erev and Roth’s model of reinforcement learning.When all players use this rule it eliminates iteratively dominated strategies and in two-person constant-sum games average payoffs converge to the value of the game. Strategies converge in constant-sum games with unique equilibria if they are pure or if they are mixed and the game is...
متن کاملCollective Learning in Games through Social Networks
This paper argues that combining social networks communication and games can positively influence the learning behavior of players. We propose a computational model that combines features of social network learning (communication) and gamebased learning (strategy reinforcement). The focus is on cooperative games, in which a coalition of players tries to achieve a common goal. We show that enric...
متن کاملInventing New Signals
A model of inventing new signals is introduced in the context sender-receiver games with reinforcement learning. If the invention parameter is set to zero, it reduces to basic Roth-Erev learning applied to acts rather than strategies, as in Argiento et. al. (2009). If every act is uniformly reinforced in every state it reduces to the Chinese Restaurant Process also known as the Hoppe-Pólya urn ...
متن کاملIndividual Di®erences in EWA Learning with Partial Payo® Information
We extend EWA learning to games in which only the set of possible foregone payo®s from unchosen strategies are known. We assume players estimate unknown foregone payo®s from a strategy, by substituting the last payo® actually received from that strategy, or by clairvoyantly guessing the actual foregone payo®. Either assumption improves predictive accuracy of EWA. Learning parameters are also es...
متن کامل