Neural Networks and their Application to Go A. Bausch Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training neural networks Problems AlphaGo Anne-Marie Bausch The Game of Go Policy Network Value Network ETH, D-MATH Monte Carlo Tree Search May 31, 2016
Table of Contents Neural Networks and their Application to Go 1 Neural Networks A. Bausch Theory Neural Training neural networks Networks Problems Theory Training neural networks Problems AlphaGo 2 AlphaGo The Game of Go Policy Network The Game of Go Value Network Monte Carlo Policy Network Tree Search Value Network Monte Carlo Tree Search
Perceptron A perceptron is the most basic artificial neuron (developed in Neural Networks and the 1950s and 1960s). their Application to Go A. Bausch Neural Networks Theory Training neural networks Problems AlphaGo The Game of Go Policy Network The input X ∈ R n , w 1 , . . . , w n ∈ R are called weights and the Value Network Monte Carlo Tree Search output Y ∈ { 0 , 1 } . The output depends on some treshold value τ : W · X = � 0 , if w j x j ≤ τ, j output = W · X = � 1 , if w j x j > τ. j
Bias Next, we introduce what is known as the perceptron’s bias B , Neural Networks and their B := − τ. Application to Go This gives us a new formula for the output, A. Bausch Neural � 0 , if W · X + B ≤ 0 , Networks output = Theory 1 , if W · X + B > 0 . Training neural networks Problems AlphaGo Example The Game of Go Policy Network Value Network NAND gate: Monte Carlo Tree Search
Sigmoid Neuron Neural Problem: Small change in input can change output a lot Networks and their → Solution: Sigmoid Neuron Application to Go Input X ∈ R n A. Bausch Output = σ ( X · W + B ) = (1 + exp ( − X · W − B )) − 1 ∈ Neural 1 � � 0 , 1 , where σ ( z ) := 1+ exp ( − z ) is called the sigmoid Networks Theory function . Training neural networks Problems AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search
Neural Networks Neural Given an input X , as well as some training and testing data, we Networks and their want to find a function f W , B such that f W , B : X → Y , where Application to Go Y denotes the output. A. Bausch Neural Networks Theory Training neural networks Problems AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search How do we choose the weights and the bias?
Example: XOR Gate Neural Networks and their Application to Go A. Bausch Neural Networks Theory Training neural networks Problems AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search
Learning Algorithm Neural Networks and their Application to Go A. Bausch A learning algorithm chooses weights and biases without Neural Networks interference of programmer. Theory Training neural Smoothness in σ : networks Problems δ output ∆ w j + δ output AlphaGo � ∆output ≈ ∆ B The Game of Go δ w j δ B Policy Network Value Network j Monte Carlo Tree Search
How to update weights and bias Neural Networks and How does the learning algorithm update the weights (and the their Application to bias)? Go A. Bausch Neural Networks Theory Training neural argmin W , B � f W , B ( X ) − Y � 2 networks Problems AlphaGo The Game of Go Policy Network Value Network → One method to do this is gradient descent Monte Carlo Tree Search → Choose appropriate learning rate! Example Digit Recognition (1990s) → Youtube Video
Example Neural Networks and their One image consists of 28x28 pixels which explains why the Application to Go input layer has 784 neurons A. Bausch Neural Networks Theory Training neural networks Problems AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search
3 main types of learning Neural Networks and their Application to Supervised Learning (SL) Go → Learning some mapping from inputs to outputs. A. Bausch Example : Classifying Digits Neural Networks Unsupervised Learning (UL) Theory Training neural → Given input and no output, what kinds of patterns can networks Problems you find? Example : Visual input is at first too complex, AlphaGo have to reduce number of dimensions The Game of Go Policy Network Value Network Reinforcement Learning (RL) Monte Carlo Tree Search → Learning method interacts with its environment by producing actions a 1 , a 2 , . . . that produce rewards or punishments r 1 , r 2 , . . . . Example : Human learning
Why was there a recent boost in the employment of neural networks? Neural Networks and their Application to Go The evolution of neural networks stagnated because networks A. Bausch with more than 2 hidden layers proved to be too difficult. The Neural main problems and their solutions are: Networks Theory Huge amount of Data Training neural networks → Big Data Problems AlphaGo Number of weights (capacity of computers) The Game of Go Policy Network → capacity of computers improved (Parallelism, GPUs) Value Network Monte Carlo Tree Search Theoretical limits → Difficult ( ⇒ See next slide)
Theoretical Limits Neural Networks and their Application to Back-propagated error signals either shrink rapidly Go (exponentially in the number of layers) or grow out of A. Bausch bounds Neural 3 solutions: Networks Theory Training neural (a) unsupervised pre-training ⇒ faciliates subsequent networks Problems supervised credit assignment through AlphaGo back-propagation (1991). The Game of Go Policy Network (b) LSTM-like networks (since 1997) avoid problem Value Network Monte Carlo Tree Search through special architecture. (c) Today, fast GPU-based computers allow for propagating errors a few layers further down within reasonable time
The Game of Go Neural Networks and their Application to Go A. Bausch Neural Networks Theory Training neural networks Problems AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search
Main rules Neural Origin: Ancient China more than 2500 years ago Networks and their Goal: Gain the most points Application to Go White gets 6.5 points for moving second A. Bausch Get points for territory at the end of game Neural Get points for prisoners Networks Theory → Stone is captured if it has no more liberties (liberties Training neural networks are “supply chains”) Problems AlphaGo Not allowed to commit suicide The Game of Go Policy Network Ko-Rule: Not allowed to play such that game is again as Value Network Monte Carlo before Tree Search
End of Game Neural Networks and The game is over when both players have passed consecutively their Application to → Prisonners are removed and points are counted! Go A. Bausch Neural Networks Theory Training neural networks Problems AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search
AlphaGo Neural Networks and DeepMind was founded in 2010 as a startup in Cambridge their Application to Google bought DeepMind for $500M in 2014 Go A. Bausch AlphaGo beat European Go champion Fan Hui (2-dan) in October 2015 Neural Networks Theory AlphaGo beat Lee Sedol (9-dan), one of the best players in Training neural networks the world in March 2016 (4 out of 5 games) Problems AlphaGo Victory of AI in Go was thought to be 10 years into the The Game of Go future Policy Network Value Network Monte Carlo 1920 CPUs and 280 GPUs used during match against Lee Tree Search Sedol → This equals around $1M without counting the electricity used for training and playing Next Game attacked by Google DeepMind: Starcraft
AlphaGo Difficulty: Search space of future Go moves is larger than the Neural Networks and number of particles in the known universe their Application to Go Policy Network A. Bausch Value Network Monte Carlo Tree Search (MCTS) Neural Networks Theory Training neural networks Problems AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search
Policy Network Part 1 Neural Multi-Layered Neural Network Networks and their Supervised-learning (SL) Application to Go Goal: Look at board position and choose next best move A. Bausch (does not care about winning, just about next move) Neural is trained on millions of example moves made by strong Networks human players on KGS (Kiseido Go Server) Theory Training neural networks it matches strong human players about 57% of time Problems (mismatches arenot necessarily mistakes) AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search
Policy Network Part 2 Neural Networks and their Application to 2 additional versions of policy networks: A stronger move Go picker and a faster move picker A. Bausch Stronger version uses RL Neural Networks → trained more intensively by playing game to the end (is Theory trained by millions of training games against previous Training neural networks Problems editions of itself, it does no reading, i.e., it does not try to AlphaGo simulate any future moves) The Game of Go Policy Network → needed for creating enough training data for value Value Network Monte Carlo network Tree Search Faster version is called “rollout network” → does not look at entire board but at smaller window around previous move → about 1000 times faster!
Recommend
More recommend