Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp - PowerPoint PPT Presentation

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Reinforcement Learning 1 ● Sequence of actions – moves in chess – driving controls in car ● Uncertainty – moves by component – random outcomes (e.g., dice rolls, impact of decisions) ● Reward delayed – chess: win/loss at end of game – Pacman: points scored throughout game ● Challenge: find optimal policy for actions Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Deep Learning 2 ● Mapping input to output through multiple layers ● Weight matrices and activation functions Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

AlphaGo 3 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Book 4 ● Lecture based on the book Deep Learning and the Game of Go by Pumperla and Ferguson, 2019 ● Hands-on introduction to game playing and neural networks ● Lots of Python code Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

5 go Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Go 6 ● Board game with white and black stones ● Stones may be placed anywhere ● If opponents stones are surrounded, you can capture them ● Ultimately: you need to claim territory ● Player with most territory and captured stones wins Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Go Board 7 ● Starting board, standard board is 19x19, but can also play with 9x9 or 13x13 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Move 1 8 ● First move: white Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Move 2 9 ● Second move: black Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Move 3 10 ● Third move: white Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Move 7 11 ● Situation after 7 moves, black’s turn Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Move 8 12 ● Move by black: surrounded white stone in the middle Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Capture 13 ● White stone in middle is captured Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Final State 14 ● Any further moves will not change outcome Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Final State with Territory Marked 15 ● Total score: number of squares in territory + number of captured stones Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Why is Go Hard for Computers? 16 ● Many moves possible – 19x19 board – 361 moves initially – games may last 300 moves ⇒ Huge branching factor in search space ● Hard to evaluate board positions – control of board most important – number of captured stones less relevant Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

17 game playing Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Game Tree 18 etc. ● Recall: game tree to consider all possible moves Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Alpha-Beta Search 19 ● Explore game tree depth-first ● Exploration stops at win or loss ● Backtrack to other paths, note best/worst outcome ● Ignore paths with worse outcomes ● This does not work for a game tree with about 361 300 states Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Evaluation Function for States 20 ● Explore game tree up to some specified maximum depth ● Evaluate leaf states – informed by knowledge of game – e.g., chess: pawn count, control of board ● This does not work either due – high branching factor – difficulty of defining evaluation function Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

21 monte carlo tree search Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Monte Carlo Tree Search 22 1/0 etc. win ● Explore depth-first randomly (”roll-out”), record win on all states along path Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Monte Carlo Tree Search 23 1/1 0/1 1/0 etc. etc. loss win ● Pick existing node as starting point, execute another roll-out, record loss Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Monte Carlo Tree Search 24 1/0 1/1 0/1 1/0 1/0 etc. etc. etc. loss win win ● Pick existing node as starting point, execute another roll-out Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Monte Carlo Tree Search 25 1/0 0/1 1/1 0/1 1/0 1/0 0/1 etc. etc. etc. etc. loss win loss win ● Pick existing node as starting point, execute another roll-out Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Monte Carlo Tree Search 26 1/0 0/1 1/2 0/1 1/0 1/1 0/1 etc. etc. etc. etc. loss loss win loss win ● Increasingly, prefer to explore paths with high win percentage Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Monte Carlo Tree Search 27 ● Which node to pick? √ log N w + c n – N total number of roll-outs – n number of roll-outs for this node in the game tree – w winning percentage – c hyper parameter to balance exploration ● This is an inference algorithm – execute, say, 10,000 roll-outs – pick initial action with best win percentage w – can be improved by following rules based on well-known local shapes Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

28 action prediction with neural networks Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Learning Moves 29 ● We would like to learn actions of game playing agent ● Input state: board position ● Output action: optimal move Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Learning Moves 30 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 1 0 0 1 0 0 0 0 0 1 -1 0 0 0 0 0 0 0 0 -1 -1 0 0 0 0 0 0 ● Machine learning problem ● Input: 5x5 matrix ● Output: 5x5 matrix Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Neural Networks 31 0 0 ● First idea: feed-forward neural network 0 0 0 0 – encode board position in n × n sized vector 0 0 0 0 – encode correct move in n × n sized vector 0 0 – add some hidden layers 0 0 0 1 0 0 ● Many parameters 0 0 0 0 – input and output vectors have dimension 361 1 0 -1 0 (19x19 board) 1 0 -1 0 – if hidden layers have same size 0 0 → 361x361 weights for each 0 0 1 0 -1 0 ● Does not generalize well -1 0 0 0 – same patterns on various locations of the board 0 0 0 0 – has to learn moves for each location 0 0 – consider everything moved one position to the right 0 0 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Convolutional Neural Networks 32 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 -1 1 1 1 0 -1 1 0 1 0 0 1 -1 0 0 0 1 -1 0 0 0 -1 -1 0 0 0 -1 -1 0 0 0 0 0 -1 0 0 0 0 0 0 0 1 0 0 -1 0 1 0 0 0 0 0 -1 1 0 0 0 -1 1 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 -1 0 0 ● Convolutional kernel: here maps 3x3 matrix to 1x1 value ● Applied to all 3x3 regions of the original matrix ● Learns local features Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Move Prediction with CNNs 33 0 0 0 0 0 Convolutional Layer Convolutional Layer Feed-forward Layer 0 0 0 0 0 Flatten 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ● May use multiple convolutional kernels (of same size) → learn different local features ● Resulting values may be added or maximum value selected (max-pooling) ● May have several convolutional neural network layers ● Final layer: softmax prediction of move Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Human Game Play Data 34 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Human Game Play Data 35 ● Game records – sequence of moves – winning player ● Convert into training data for move prediction – one move at a time – prediction + 1 for move if winner – prediction − 1 for move if loser ● learn winning moves, avoid losing moves Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Playing Go with Neural Move Predictor 36 ● Greedy search ● Make prediction at each turn ● Selection move with highest probability Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

37 reinforcement learning Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp - PowerPoint PPT Presentation

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020 Reinforcement Learning 1 Sequence of actions moves in chess driving controls in car

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Reproducibility and Replicability in Deep Reinforcement Learning (and Other Deep Learning

Deep he(a)p, big feat arXiv:1707.06887 A Distributional Perspective on Reinforcement Learning

Reinforcement Learning CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1.

Advanced Model-Based Reinforcement Learning CS 294-112: Deep Reinforcement Learning Sergey

Deep Reinforcement Learning and Complex Environments Raia Hadsell End-to-end Deep Learning

Inverse Reinforcement Learning CS 294-112: Deep Reinforcement Learning Sergey Levine Todays

CMP784 DEEP LEARNING Lecture #12 Deep Reinforcement Learning Aykut Erdem // Hacettepe

Deep Reinforcement Learning for Robotics:

Introduction to Deep Reinforcement Learning and Control Spring 2019, CMU 10-403 Katerina

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3.

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

10703 Deep Reinforcement Learning Reinforcement Learning in Humans and Animals Tom Mitchell

DeepMind Self-Learning Atari Agent Human - level control through deep reinforcement learning

Toward In Interpretable De Deep Re Reinforcement Lea Learning g wi with Li Linea ear Model

SDRL: Interpretable and Data-efficient Deep Liu Reinforcement Learning Introduction Background

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 4: Q-Value based RL Animesh

Deep Reinforcement Learning Lecture 1 Sergey Levine How do we build intelligent machines?

SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning Marvin Zhang*,

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp Koehn Artificial Intelligence: