Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Reinforcement Learning 1 ● Sequence of actions – moves in chess – driving controls in car ● Uncertainty – moves by component – random outcomes (e.g., dice rolls, impact of decisions) ● Reward delayed – chess: win/loss at end of game – Pacman: points scored throughout game ● Challenge: find optimal policy for actions Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Deep Learning 2 ● Mapping input to output through multiple layers ● Weight matrices and activation functions Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
AlphaGo 3 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Book 4 ● Lecture based on the book Deep Learning and the Game of Go by Pumperla and Ferguson, 2019 ● Hands-on introduction to game playing and neural networks ● Lots of Python code Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
5 go Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Go 6 ● Board game with white and black stones ● Stones may be placed anywhere ● If opponents stones are surrounded, you can capture them ● Ultimately: you need to claim territory ● Player with most territory and captured stones wins Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Go Board 7 ● Starting board, standard board is 19x19, but can also play with 9x9 or 13x13 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Move 1 8 ● First move: white Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Move 2 9 ● Second move: black Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Move 3 10 ● Third move: white Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Move 7 11 ● Situation after 7 moves, black’s turn Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Move 8 12 ● Move by black: surrounded white stone in the middle Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Capture 13 ● White stone in middle is captured Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Final State 14 ● Any further moves will not change outcome Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Final State with Territory Marked 15 ● Total score: number of squares in territory + number of captured stones Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Why is Go Hard for Computers? 16 ● Many moves possible – 19x19 board – 361 moves initially – games may last 300 moves ⇒ Huge branching factor in search space ● Hard to evaluate board positions – control of board most important – number of captured stones less relevant Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
17 game playing Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Game Tree 18 etc. ● Recall: game tree to consider all possible moves Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Alpha-Beta Search 19 ● Explore game tree depth-first ● Exploration stops at win or loss ● Backtrack to other paths, note best/worst outcome ● Ignore paths with worse outcomes ● This does not work for a game tree with about 361 300 states Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Evaluation Function for States 20 ● Explore game tree up to some specified maximum depth ● Evaluate leaf states – informed by knowledge of game – e.g., chess: pawn count, control of board ● This does not work either due – high branching factor – difficulty of defining evaluation function Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
21 monte carlo tree search Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Monte Carlo Tree Search 22 1/0 etc. win ● Explore depth-first randomly (”roll-out”), record win on all states along path Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Monte Carlo Tree Search 23 1/1 0/1 1/0 etc. etc. loss win ● Pick existing node as starting point, execute another roll-out, record loss Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Monte Carlo Tree Search 24 1/0 1/1 0/1 1/0 1/0 etc. etc. etc. loss win win ● Pick existing node as starting point, execute another roll-out Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Monte Carlo Tree Search 25 1/0 0/1 1/1 0/1 1/0 1/0 0/1 etc. etc. etc. etc. loss win loss win ● Pick existing node as starting point, execute another roll-out Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Monte Carlo Tree Search 26 1/0 0/1 1/2 0/1 1/0 1/1 0/1 etc. etc. etc. etc. loss loss win loss win ● Increasingly, prefer to explore paths with high win percentage Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Monte Carlo Tree Search 27 ● Which node to pick? √ log N w + c n – N total number of roll-outs – n number of roll-outs for this node in the game tree – w winning percentage – c hyper parameter to balance exploration ● This is an inference algorithm – execute, say, 10,000 roll-outs – pick initial action with best win percentage w – can be improved by following rules based on well-known local shapes Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
28 action prediction with neural networks Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Learning Moves 29 ● We would like to learn actions of game playing agent ● Input state: board position ● Output action: optimal move Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Learning Moves 30 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 1 0 0 1 0 0 0 0 0 1 -1 0 0 0 0 0 0 0 0 -1 -1 0 0 0 0 0 0 ● Machine learning problem ● Input: 5x5 matrix ● Output: 5x5 matrix Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Neural Networks 31 0 0 ● First idea: feed-forward neural network 0 0 0 0 – encode board position in n × n sized vector 0 0 0 0 – encode correct move in n × n sized vector 0 0 – add some hidden layers 0 0 0 1 0 0 ● Many parameters 0 0 0 0 – input and output vectors have dimension 361 1 0 -1 0 (19x19 board) 1 0 -1 0 – if hidden layers have same size 0 0 → 361x361 weights for each 0 0 1 0 -1 0 ● Does not generalize well -1 0 0 0 – same patterns on various locations of the board 0 0 0 0 – has to learn moves for each location 0 0 – consider everything moved one position to the right 0 0 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Convolutional Neural Networks 32 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 -1 1 1 1 0 -1 1 0 1 0 0 1 -1 0 0 0 1 -1 0 0 0 -1 -1 0 0 0 -1 -1 0 0 0 0 0 -1 0 0 0 0 0 0 0 1 0 0 -1 0 1 0 0 0 0 0 -1 1 0 0 0 -1 1 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 -1 0 0 ● Convolutional kernel: here maps 3x3 matrix to 1x1 value ● Applied to all 3x3 regions of the original matrix ● Learns local features Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Move Prediction with CNNs 33 0 0 0 0 0 Convolutional Layer Convolutional Layer Feed-forward Layer 0 0 0 0 0 Flatten 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ● May use multiple convolutional kernels (of same size) → learn different local features ● Resulting values may be added or maximum value selected (max-pooling) ● May have several convolutional neural network layers ● Final layer: softmax prediction of move Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Human Game Play Data 34 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Human Game Play Data 35 ● Game records – sequence of moves – winning player ● Convert into training data for move prediction – one move at a time – prediction + 1 for move if winner – prediction − 1 for move if loser ● learn winning moves, avoid losing moves Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Playing Go with Neural Move Predictor 36 ● Greedy search ● Make prediction at each turn ● Selection move with highest probability Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
37 reinforcement learning Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020
Recommend
More recommend