An Introductory Tutorial on Implementing DRL Algorithms with DQN and - PowerPoint PPT Presentation

An Introductory Tutorial on Implementing DRL Algorithms with DQN and TensorFlow Tim Tse May 18, 2018

Recap: The RL Loop

A Simplified View of the Implementation Steps for RL Algorithms 1. The environment (taken care of by OpenAI Gym ) 2. The agent 3. A while loop that simulates the interaction between the agent and environment

Implementing the DQN Agent ◮ We wish to learn state-action value function Q ( s t , a t ) for all s t , a t .

Implementing the DQN Agent ◮ We wish to learn state-action value function Q ( s t , a t ) for all s t , a t . ◮ Recall the recursive relationship, Q ( s t , a t ) = r t + γ max a ′ Q ( s t +1 , a ′ ).

Implementing the DQN Agent ◮ We wish to learn state-action value function Q ( s t , a t ) for all s t , a t . ◮ Recall the recursive relationship, Q ( s t , a t ) = r t + γ max a ′ Q ( s t +1 , a ′ ). ◮ Using this relation, define MSE loss function N L ( w ) = 1 � ( r i w ( s i − Q w ( s i t , a i ) 2 t +1 , a ′ ) t + γ max a ′ Q ¯ t ) N � �� i =1 � �� current estimate target where { ( s 1 t , a 1 t , r 1 t , s 1 t +1 ) , · · · , ( s N t , a N t , r N t , s N t +1 ) } are the training tuples and γ ∈ [0 , 1] is the discount factor.

Implementing the DQN Agent ◮ We wish to learn state-action value function Q ( s t , a t ) for all s t , a t . ◮ Recall the recursive relationship, Q ( s t , a t ) = r t + γ max a ′ Q ( s t +1 , a ′ ). ◮ Using this relation, define MSE loss function N L ( w ) = 1 � ( r i w ( s i − Q w ( s i t , a i ) 2 t +1 , a ′ ) t + γ max a ′ Q ¯ t ) N � �� i =1 � �� current estimate target where { ( s 1 t , a 1 t , r 1 t , s 1 t +1 ) , · · · , ( s N t , a N t , r N t , s N t +1 ) } are the training tuples and γ ∈ [0 , 1] is the discount factor. ◮ Parameterize Q ( · , · ) using a function approximator with weights w .

Implementing the DQN Agent ◮ We wish to learn state-action value function Q ( s t , a t ) for all s t , a t . ◮ Recall the recursive relationship, Q ( s t , a t ) = r t + γ max a ′ Q ( s t +1 , a ′ ). ◮ Using this relation, define MSE loss function N L ( w ) = 1 � ( r i w ( s i − Q w ( s i t , a i ) 2 t +1 , a ′ ) t + γ max a ′ Q ¯ t ) N � �� i =1 � �� current estimate target where { ( s 1 t , a 1 t , r 1 t , s 1 t +1 ) , · · · , ( s N t , a N t , r N t , s N t +1 ) } are the training tuples and γ ∈ [0 , 1] is the discount factor. ◮ Parameterize Q ( · , · ) using a function approximator with weights w . ◮ With “deep” RL our function approximator is an artificial neural network (so w denotes the weights of our ANN).

Implementing the DQN Agent ◮ We wish to learn state-action value function Q ( s t , a t ) for all s t , a t . ◮ Recall the recursive relationship, Q ( s t , a t ) = r t + γ max a ′ Q ( s t +1 , a ′ ). ◮ Using this relation, define MSE loss function N L ( w ) = 1 � ( r i w ( s i − Q w ( s i t , a i ) 2 t +1 , a ′ ) t + γ max a ′ Q ¯ t ) N � �� i =1 � �� current estimate target where { ( s 1 t , a 1 t , r 1 t , s 1 t +1 ) , · · · , ( s N t , a N t , r N t , s N t +1 ) } are the training tuples and γ ∈ [0 , 1] is the discount factor. ◮ Parameterize Q ( · , · ) using a function approximator with weights w . ◮ With “deep” RL our function approximator is an artificial neural network (so w denotes the weights of our ANN). ◮ For stability, target weights ¯ w are held constant during training.

Translating the DQN Agent to Code... Let’s look at how we can do the following in TensorFlow: 1. Declare an ANN that parameterizes Q ( s , a ). ◮ I.e., our example ANN will have structure state dim -256-256- action dim . 2. Specify a loss function to be optimized.

Two Phases of Execution in TensorFlow 1. Building the computational graph. ◮ Specifying the structure of your ANN (i.e., which outputs connect to which inputs). ◮ Numerical computations are not being performed during this phase. 2. Running tf.Session() . ◮ Numerical computations are being performed during this phase. ◮ For example, ◮ Initial weights are being populated. ◮ Tensors are being passed in and outputs are computed (forward pass). ◮ Gradients are being computed and back-propagated (backward pass).

Implementation Steps for RL Algorithms 1. The environment (taken care of by OpenAI Gym ) 2. The agent 3. The logic that ties the agent and environment together

The Interaction Loop Between Agent and Environment for e number of epochs do Initialize environment and observe initial state s ; while epoch is not over do In state s , take action a with an exploration policy (i.e., ǫ -greedy) and receive next state s’ and reward r feedback; Update exploration policy; Cache training tuple ( s , a , r , s’ ); Update agent; s ← s’ ; end end Algorithm 1: An example of one possible interaction loop between agent and environment.

An Introductory Tutorial on Implementing DRL Algorithms with DQN and - PowerPoint PPT Presentation

An Introductory Tutorial on Implementing DRL Algorithms with DQN and TensorFlow Tim Tse May 18, 2018 Recap: The RL Loop A Simplified View of the Implementation Steps for RL Algorithms 1. The environment (taken care of by OpenAI Gym ) 2. The

BIBLICAL SURVEY Introductory Class Introductory Class BIBLICAL SURVEY Introductory Class

Tutorial Tutorial A2 is out, its called Inpainting Tutorial Tutorial A2 is out, its called

A GAMS TUTORIAL A GAMS TUTORIAL A GAMS TUTORIAL WHAT IS GAMS ? General Algebraic Modeling

Algorithms Theory Algorithms Theory 10 10 Greedy Algorithms G d Al ith Dr. Alexander

Excel Tutorial 1 Getting Started with Excel Tutorial 2 Formatting a Workbook Tutorial 3

Implementing Legacy Implementing Legacy Statistical Algorithms in a Statistical Algorithms in a

Drug Related Litter (DRL) Scrutiny Panel Colin McAllister: SCC Integrated Commissioning Unit

Julio E. Lpez-Ferrao Program Director, EHR/DRL Na=onal

Defining and Measuring STEM Identity, Interest, and Engagement August 13, 2019 Our Disclaimer

Electric Linear Slides EAS AR Equipped Electric Cylinders AR EAC DRL Hollow Rotary

Role of image quality in dose management via/through DRL Ehsan Samei, PhD, FAAPM, FSPIE, FAIMBE,

YunQi 2050 - DRL Session Communication in Multi-agent Reinforcement Learning Ying Wen

sequencing data https://github.com/DRL/blobtools thanks to Sujai Kumar, Dominik Laetsch (Blaxter

C++ Introductory Tutorial Part I : Basic Language Features Institute of Computer Graphics and

Implementing Perl 6 Jonathan Worthington Dutch Perl Workshop 2008 Implementing Perl 6 I

61A Extra Lecture 6 Implementing an Object System 3 Implementing an Object System Today's

Lecture 10: Neural Networks (Part 2) Feb 25th, 2020 Lecturer: Steven Wu Scribe: Steven Wu 1

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG

Learning Architectures and Loss Functions in Continuous Space Fei Tian Machine Learning Group

A Unified View of Loss Functions in Supervised Learning Shuiwang Ji Department of Computer

Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F . Chen, Markus

Part 5: Structured Support Vector Machines Sebastian Nowozin and Christoph H. Lampert Colorado

CSCE 978 Lecture 3: Risk and Loss Functions Introduction In Lecture 1 we mentioned our

Ba Bayesi esian Deep Deep Le Lear arning ning Prof. Leal-Taix and Prof. Niessner 1 Go