Monte-Carlo tree search and rapid action value estimation in computer Go
نویسندگان
چکیده
A new paradigm for search, based on Monte-Carlo simulation, has revolutionised the performance of computer Go programs. In this article we describe two extensions to the Monte-Carlo tree search algorithm, which significantly improve the effectiveness of the basic algorithm. When we applied these two extensions to the Go program MoGo, it became the first program to achieve dan (master) level at 9 × 9 Go. In this article we survey the Monte-Carlo revolution in computer Go, outline the key ideas that led to the success of MoGo and subsequent Go programs, and provide for the first time a comprehensive description, in theory and in practice, of this extended framework for Monte-Carlo tree search.
منابع مشابه
Generalized Rapid Action Value Estimation
Monte Carlo Tree Search (MCTS) is the state of the art algorithm for many games including the game of Go and General Game Playing (GGP). The standard algorithm for MCTS is Upper Confidence bounds applied to Trees (UCT). For games such as Go a big improvement over UCT is the Rapid Action Value Estimation (RAVE) heuristic. We propose to generalize the RAVE heuristic so as to have more accurate es...
متن کاملA new self-acquired knowledge process for Monte Carlo Tree Search
Computer Go is one of the most challenging field in Artificial Intelligence Game. In this area the use of Monte Carlo Tree Search has emerged as a very attractive research direction, where recent advances have been achieved and permitted to significantly increase programs efficiency. These enhancements result from combining tree search used to identify best next moves, and a Monte Carlo process...
متن کاملMemory-Augmented Monte Carlo Tree Search
This paper proposes and evaluates Memory-Augmented Monte Carlo Tree Search (M-MCTS), which provides a new approach to exploit generalization in online realtime search. The key idea of M-MCTS is to incorporate MCTS with a memory structure, where each entry contains information of a particular state. This memory is used to generate an approximate value estimation by combining the estimations of s...
متن کاملAchieving Master Level Play in 9 x 9 Computer Go
The UCT algorithm uses Monte-Carlo simulation to estimate the value of states in a search tree from the current state. However, the first time a state is encountered, UCT has no knowledge, and is unable to generalise from previous experience. We describe two extensions that address these weaknesses. Our first algorithm, heuristic UCT, incorporates prior knowledge in the form of a value function...
متن کاملAchieving Master Level Play in 9 × 9 Computer Go
The UCT algorithm uses Monte-Carlo simulation to estimate the value of states in a search tree from the current state. However, the first time a state is encountered, UCT has no knowledge, and is unable to generalise from previous experience. We describe two extensions that address these weaknesses. Our first algorithm, heuristic UCT, incorporates prior knowledge in the form of a value function...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Artif. Intell.
دوره 175 شماره
صفحات -
تاریخ انتشار 2011