Policy Gradient Prof. Kuan-Ting Lai 2020/5/22 Advantages of - PowerPoint PPT Presentation

Nov 12, 2023 •484 likes •750 views

Policy Gradient Prof. Kuan-Ting Lai 2020/5/22 Advantages of Policy-based RL Previously we focused on approximating value or action-value function: Policy Gradient methods focus on parameterize the policy: 3 Types of Reinforcement

Policy Gradient Prof. Kuan-Ting Lai 2020/5/22
Advantages of Policy-based RL • Previously we focused on approximating value or action-value function: • Policy Gradient methods focus on parameterize the policy:
3 Types of Reinforcement Learning • Value-based − Learn value function Model-based − Implicit policy • Policy-based − No value function − Learn Policy directly • Actor-critic Value- Policy- − Learn both value and policy Actor based based function -critic Policy DQN Gradient
Lex Fridman, MIT Deep Learning, https://deeplearning.mit.edu/
Policy Objective Function • Goal: given policy 𝜌 𝜄 (𝑡, 𝑏) with parameters θ , find best θ • How to measure the quality of a policy? 𝐾 𝜄 ← 𝑤 𝜌 𝑡 0 = 𝐹[σ 𝜌(𝑏|𝑡)𝑟 𝜌 (𝑡, 𝑏) ]
Short Corridor with Switched Actions
Policy Optimization • Policy-based RL is an optimization problem that can be solved by: − Hill climbing − Simplex / amoeba / Nelder Mead − Genetic algorithms − Gradient descent − Conjugate gradient − Quasi-newton
Computing Gradients By Finite Differences • Estimate kth partial derivative of objective function w.r.t. Θ • By perturbing by small amount in k -th dimension where 𝑣 𝑙 is unit vector with 1 in k- th component, 0 elsewhere • Simple, noisy, inefficient but sometime work! • Works for all kinds of policy, even if policy is not differentiable
Score Function • Assume 𝜌 𝜄 is differentiable whenever it is non-zero • Score function is ∇ 𝜄 log 𝜌 𝜄 (𝑡, 𝑏)
Softmax Policy • Softmax function • Use linear approximation function
Policy Gradient Theorem • Generalized policy gradient (proof @ Sutton’s book, pg.325)
Proof of Policy Gradient Theorem (2-1)
Proof of Policy Gradient Theorem (2-1)
REINFOCE: Monte Carlo Policy Gradient REINFORCE Update
Pseudo Code of REINFORCE
REINFORCE on Short Corridor
REINFORCE with Baseline • Include an arbitrary baseline function b(s) − Equation is valid because
Gradient of REINFORCE with Baseline
Baseline Can Help to Learn Faster
Actor-Critic Methods • Baseline cannot bootstrap − Use learned state-value function as baseline -> Actor-Critic
Policy Gradient for Continuing Problems • Continuing problem (No episode boundaries) − Use average reward per time step: TD( λ )
Actor-Critic with Eligibility Traces
Policy Parameterization for Continuous Action
Reference 1. David Silver, Lecture 7: Policy Gradient 2. Chapter 13, Richard S. Sutton and Andrew G. Barto , “Reinforcement Learning: An Introduction,” 2 nd edition, Nov. 2018

Recommend

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

Multivariate Fundamentals: Rotation/Distance Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective: Use one dataset to explain another Use the spatial patterns of each dataset to try and understand the

364 views • 24 slides

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient Descent Quadratic Forms Gradient Descent in Quadratic Forms Eigen vectors and values Gradient Descent Convergence Conjugate

1.1k views • 50 slides

From Importance Sampling to Doubly Robust Policy Gradient Jiawei Huang (UIUC) Nan Jiang (UIUC)

From Importance Sampling to Doubly Robust Policy Gradient Jiawei Huang (UIUC) Nan Jiang (UIUC) Basic Idea Policy Gradient Estimators Ofg-Policy Evaluation Estimators 1 Basic Idea Policy Gradient Estimators Ofg-Policy Evaluation Estimators

262 views • 25 slides

CS234 Notes - Lecture 9 Advanced Policy Gradient Patrick Cho, Emma Brunskill February 11, 2019

CS234 Notes - Lecture 9 Advanced Policy Gradient Patrick Cho, Emma Brunskill February 11, 2019 1 Policy Gradient Objective Recall that in Policy Gradient, we parameterize the policy and directly optimize for it using expe- rience in the

440 views • 16 slides

CSC321 Lecture 21: Policy Gradient Roger Grosse Roger Grosse CSC321 Lecture 21: Policy Gradient

CSC321 Lecture 21: Policy Gradient Roger Grosse Roger Grosse CSC321 Lecture 21: Policy Gradient 1 / 21 Overview Most of this course was about supervised learning, plus a little unsupervised learning. Final 3 lectures: reinforcement learning

428 views • 21 slides

How to use Gradient and Multi-Texture 1. Many situations, we need use the gradient texture for our

How to use Gradient and Multi-Texture 1. Many situations, we need use the gradient texture for our shape or text, so let s learn how to use our gradient and multi-texture functions. First, crate a shape. 2. Select this shape, switch to color

196 views • 3 slides

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient 1 / 12 Motivation Two classes of optimization procedures used throughout ML (stochastic) gradient descent, with momentum, and maybe

406 views • 12 slides

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020) Learning objectives Basic idea of gradient descent stochastic gradient descent method of momentum using an adaptive learning rate sub-gradient

574 views • 34 slides

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University of Virginia Overview 1. Gradient Descent 2. Stochastic Gradient Descent 3. SGD with Momentum 4. Adaptive Learning Rates 1 Gradient Descent

768 views • 66 slides

Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3.

Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic 5. Model Based Planning in Discrete

769 views • 53 slides

Steps to understanding Policy-gradient methods Policy approximation ( a | s, ) The

Steps to understanding Policy-gradient methods Policy approximation ( a | s, ) The average-reward (reward rate) objective r ( ) Stochastic gradient ascent/descent t r ( ) The

412 views • 22 slides

OSMOSIS and DIFFUSION Concentration gradient Concentration Gradient - change in the concentration

OSMOSIS and DIFFUSION Concentration gradient Concentration Gradient - change in the concentration of a substance from one area to another. Molecules are always moving Molecules move randomly and bump into each other and other barriers

135 views • 13 slides

Gradient interfaces with and without disorder Codina Cotar University College London September

Gradient interfaces with and without disorder Gradient interfaces with and without disorder Codina Cotar University College London September 09, 2014, Toronto Gradient interfaces with and without disorder Outline 1 Physics motivation Example

1.45k views • 33 slides

Gradient Gibbs measures with disorder Codina Cotar University College London April 16, 2015,

Gradient Gibbs measures with disorder Gradient Gibbs measures with disorder Codina Cotar University College London April 16, 2015, Providence Partly based on joint works with Christof Klske Gradient Gibbs measures with disorder Outline 1

616 views • 35 slides

20 Kelvin cold High gradient RF gun Materials and gradient Some properties of pure metals in low

20 Kelvin cold High gradient RF gun Materials and gradient Some properties of pure metals in low temperature region Cold RF-photo GUN design Vladimir Vogel, Motivation Super conductive Linac Normal temperature RF Gun 2 P ~ G * R *

409 views • 29 slides

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS 2020 Aaron Mishkin, amishkin@cs.ubc.ca 1 21 Stochastic Gradient Descent: Workhorse of ML? Stochastic gradient descent (SGD) is today one of

634 views • 21 slides

On Computing Optimal Thresholds in Decentralized Sequential Hypothesis Testing Can Cui and Aditya

On Computing Optimal Thresholds in Decentralized Sequential Hypothesis Testing Can Cui and Aditya Mahajan Electrical and Computer Engineering Department, McGill University 54th Conference on Decision and Control C. Cui and A. Mahajan (McGill

467 views • 24 slides

GENETIC ALGORITHMS: PREREQUISITES Date: Friday 18 March 2016 Course: Functional Programming and

GENETIC ALGORITHMS: PREREQUISITES Date: Friday 18 March 2016 Course: Functional Programming and Intelligent Algorithms Lecturer: Robin T. Bye 1 Topics in this module Introduction to AI and optimisation Nature-inspired algorithms

608 views • 29 slides

Algorithms for unconstrained local optimization Fabio Schoen 2008

Algorithms for unconstrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for unconstrained local optimization p. Optimization Algorithms Most common form for optimization algorithms: Line

891 views • 87 slides

Optimal rotary control of the cylinder wake using POD reduced order model Michel Bergmann,

Optimal rotary control of the cylinder wake using POD reduced order model Michel Bergmann, Laurent Cordier & Jean-Pierre Brancher Michel.Bergmann@ensem.inpl-nancy.fr Laboratoire d Energ etique et de M ecanique Th eorique et

834 views • 30 slides

On feedback target control for uncertain discrete-time systems through polyhedral techniques

On feedback target control for uncertain discrete-time systems through polyhedral techniques Elena K. Kostousova N.N. Krasovskii Institute of Mathematics and Mechanics of Ural Branch of Russian Academy of Sciences Ekaterinburg, Russia e-mail:

601 views • 23 slides

On the Interaction of an Electro-dynamic On the Interaction of an Electro-dynamic Shaker and a

On the Interaction of an Electro-dynamic On the Interaction of an Electro-dynamic Shaker and a Beam with Stiffness Nonlinearity B. Tang 1 , M.J. Brennan 2 , G. Gatti 3 1 Institute of Internal Combustion Engine, Dalian University of Technology,

680 views • 33 slides

Helical Antennas with Improved Gain 1 School of Electrical Engineering, University of Belgrade,

A.R. Djordjevi 1 , D.I. Ol an 1 , J.R. Mosig 2 Helical Antennas with Improved Gain 1 School of Electrical Engineering, University of Belgrade, Serbia 2 Ecole Polytechnique Fdrale de Lausanne, Switzerland COST Action IC0603 Workshop

426 views • 20 slides

Numerical Shape Optimization Praveen. C praveen@math.tifrbng.res.in Tata Institute of

Numerical Shape Optimization Praveen. C praveen@math.tifrbng.res.in Tata Institute of Fundamental Research Center for Applicable Mathematics Bangalore 560065 http://math.tifrbng.res.in IMI Workshop on Control and Inverse Problems IISc,

1.35k views • 118 slides