Monte-Carlo Tree Search Parallelisation International Go Symposium 2012 Francois van Niekerk francoisvn@ml.sun.ac.za August 2012
Collaborators: Steve Kroon Gert-Jan van Rooyen Cornelia Inggs This work was partially supported by the National Research Foundation of South Africa.
Outline Introduction 1 Background 2 Computer Go Monte-Carlo Tree Search Parallelisation 3 Implementation Testing and Results 4 Multi-Core Parallelisation Cluster Parallelisation 5 New Developments Conclusions 6
Introduction • Top Go programs are currently about 5 dan KGS. • Monte-Carlo Tree Search (MCTS) is dominant Computer Go algorithm. • MCTS parallelisation possible on multi-core and cluster systems.
Computer Go • Tree for moves and their follow-ups. • Exponential tree growth means brute-force is infeasible. • Evaluation function is used to avoid growing tree too far.
Classical Methods • Emulate humans with expert knowledge. • Difficult to assimilate new knowledge into an already large body. • Top strength in SDKs, far from pros.
Monte-Carlo Tree Search • Monte-Carlo methods — stochastic simulations (playouts). • Winrate of playouts starting from a position is the value of the position. • Playouts are used in a tree to form Monte-Carlo Tree Search (MCTS). • MCTS can be broken into four parts: selection, expansion, simulation and backpropagation.
Monte-Carlo Tree Search 4/9 1/3 0/1 3/5 0/1 1/1 2/3 0/1 1/1 0/1 Selection
Monte-Carlo Tree Search 4/9 1/3 0/1 3/5 0/1 1/1 2/3 0/1 1/1 0/1 Expansion
Monte-Carlo Tree Search 4/9 1/3 0/1 3/5 0/1 1/1 2/3 0/1 1/1 0/1 W Simulation (playout)
Monte-Carlo Tree Search 4/9 1/3 0/1 3/5 0/1 1/1 2/3 0/1 1/1 0/1 1/1 Backpropagation
Monte-Carlo Tree Search 4/9 1/3 0/1 3/5 0/1 1/1 2/3 1/2 1/1 0/1 1/1 Backpropagation
Monte-Carlo Tree Search 4/9 1/3 0/1 4/6 0/1 1/1 2/3 1/2 1/1 0/1 1/1 Backpropagation
Monte-Carlo Tree Search 5/10 1/3 0/1 4/6 0/1 1/1 2/3 1/2 1/1 0/1 1/1 Backpropagation
Parallelisation • Improve MCTS: improve algorithm or increase playouts. • Increasing number of playouts increases playing strength. • Increase playouts: increase thinking time or playout rate. • Parallelisation: use parallel hardware to increase playout rate and therefore strength. • Three parallelisation methods for MCTS: tree, leaf, and root.
Tree Parallelisation • Single shared tree. • Well-suited to shared-memory systems, such as multi-core systems.
Leaf Parallelisation master: • Master and slave nodes. • Only one tree, on the master. • Slaves are playout workers. slaves:
Root Parallelisation • Each execution node maintains a tree. • Each node performs MCTS. • Periodic sharing of information.
Parallel Effect • Strength penalty for parallelisation. • Due to change from sequential to parallel execution. • More pronounced if the playout updates are delayed, for example in root vs. multi-core parallelisation.
Implementation • Oakfoam is an open-source cross-platform MCTS engine for Computer Go. • Tree parallelisation for multi-core systems. • Root parallelisation for cluster systems.
Testing and Results • Test for playout rate increase. • If increase found, test for strength penalty. • If strength penalty found, test for overall strength increase.
Multi-Core Parallelisation Results 8 8 Ideal Ideal No additions No additions Virtual Loss Virtual Loss Lock-free Lock-free 4 4 Both additions Both additions Speedup Speedup 2 2 1 1 1 2 4 8 1 2 4 8 Cores Cores Speedup on 9x9 Speedup on 19x19
Cluster Parallelisation Results 100 100 Baseline 10s/move 10s/move p = 0 . 1 10s/move p = 0 . 2 90 90 10s/move p = 0 . 05 Winrate vs. 1-Core [%] Winrate vs. 1-Core [%] 2s/move p = 0 . 1 2s/move p = 0 . 2 80 80 2s/move p = 0 . 05 70 70 Baseline 10s/move 60 60 10s/move p = 0 . 1 2s/move p = 0 . 1 50 50 1 2 4 8 16 1 2 4 8 16 32 64 Cores/Periods Cores/Periods Strength Comparison on 9x9 Strength Comparison on 19x19
Overview of Results • Multi-core: tree parallelisation showed linear scaling up to eight cores (physical limit in these tests). • Cluster: root parallelisation for 19x19 showed scaling up to eight nodes, where it had a four-core ideal strength improvement.
New Developments • Pachi uses virtual wins and losses to improve cluster scaling. • Depth-First UCT changes MCTS from a best-first to a depth-first search. • Distributed UCT, and Distributed Depth-First UCT use Transposition-table Driven Scheduling to break up the tree across nodes. • UCT-Treesplit uses Transposition-table Driven Scheduling to break up the MCTS work across nodes. • Only virtual wins and losses applied to Computer Go so far.
Conclusions • MCTS is dominant algorithm for Computer Go. • Parallelisation on multi-core systems scales well. • Parallelisation on cluster systems possible, but still room for improvement. • Future of cluster parallelisation holds possibilities.
Thanks Thank you for taking time to listen to this talk. More information about this talk is available at: http://oakfoam.com/igs2012 . Please send any questions to: francoisvn@ml.sun.ac.za .
Recommend
More recommend