CS 285 Instructor: Sergey Levine UC Berkeley Recap: policy - PowerPoint PPT Presentation

Jan 21, 2024 •139 likes •403 views

Advanced Policy Gradients CS 285 Instructor: Sergey Levine UC Berkeley Recap: policy gradients fit a model to estimate return generate samples (i.e. run the policy) improve the policy reward to go can also use function

Advanced Policy Gradients CS 285 Instructor: Sergey Levine UC Berkeley
Recap: policy gradients fit a model to estimate return generate samples (i.e. run the policy) improve the policy “reward to go” can also use function approximation here
Why does policy gradient work? fit a model to estimate return generate samples (i.e. run the policy) improve the policy look familiar?
Policy gradient as policy iteration
Policy gradient as policy iteration importance sampling
Ignoring distribution mismatch? ? why do we want this to be true? is it true? and when?
Bounding the Distribution Change
Ignoring distribution mismatch? ? why do we want this to be true? is it true? and when?
Bounding the distribution change seem familiar? not a great bound, but a bound!
Bounding the distribution change Proof based on: Schulman, Levine, Moritz, Jordan, Abbeel . “Trust Region Policy Optimization.”
Bounding the objective value
Where are we at so far?
Policy Gradients with Constraints
A more convenient bound KL divergence has some very convenient properties that make it much easier to approximate!
How do we optimize the objective?
How do we enforce the constraint? can do this incompletely (for a few grad steps)
Natural Gradient
How (else) do we optimize the objective? Use first order Taylor approximation for objective (a.k.a., linearization)
How do we optimize the objective? (see policy gradient lecture for derivation) exactly the normal policy gradient!
Can we just use the gradient then?
Can we just use the gradient then? not the same! second order Taylor expansion
Can we just use the gradient then? natural gradient
Is this even a problem in practice? (image from Peters & Schaal 2008) Essentially the same problem as this: (figure from Peters & Schaal 2008)
Practical methods and notes • Natural policy gradient • Generally a good choice to stabilize policy gradient training • See this paper for details: • Peters, Schaal. Reinforcement learning of motor skills with policy gradients. • Practical implementation: requires efficient Fisher-vector products, a bit non-trivial to do without computing the full matrix • See: Schulman et al. Trust region policy optimization • Trust region policy optimization • Just use the IS objective directly • Use regularization to stay close to old policy • See: Proximal policy optimization
Review • Policy gradient = policy iteration • Optimize advantage under new policy state distribution fit a model to • Using old policy state distribution optimizes a estimate return bound, if the policies are close enough generate • Results in constrained optimization problem samples (i.e. run the policy) • First order approximation to objective = gradient ascent improve the policy • Regular gradient ascent has the wrong constraint, use natural gradient • Practical algorithms • Natural policy gradient • Trust region policy optimization

Recommend

Performa 285 Performa 285 High Alloy Zinc Nickel High Alloy Zinc Nickel Alloy Zinc Automotive

Performa 285 Performa 285 High Alloy Zinc Nickel High Alloy Zinc Nickel Alloy Zinc Automotive Applications Alloy Zinc Automotive Applications Anti-Vibration Components Door Hinges Fasteners Brake Calipers Fluid Delivery Tubes Alkaline

320 views • 18 slides

Ichthys LNG Project Ichthys Project Location Abadi WA 285 P Ichthys Field WA 285

Two LNG Projects Ichthys and Abadi May 12, 2011 Ichthys LNG Project Ichthys Project Location Abadi WA 285 P Ichthys Field WA 285 P/WA 37 R Ichthys Field & Adjacent Area 3 Extent of Ichthys Field 4 Brief Summary of

441 views • 18 slides

I-285 Top End Express Lanes I-285 Westside Express Lanes 1 Unprecedented Growth in Metro

I-285 Top End Express Lanes I-285 Westside Express Lanes 1 Unprecedented Growth in Metro Atlanta Metro Atlanta Major companies located in metro Atlanta by the numbers: 15.7% population increase since 2010 65.2% employment rate 2

398 views • 23 slides

Ichthys LNG Project Ichthys NG roject Ichthys Project Location Abadi WA 285 P Ichthys

Two LNG Projects Ichthys and Abadi j y Ichthys LNG Project Ichthys NG roject Ichthys Project Location Abadi WA 285 P Ichthys Field WA 285 P/WA 37 R Ichthys Field & Adjacent Area 3 Extent of Ichthys Field 4

220 views • 21 slides

BLU-285: A potent and highly selective inhibitor designed to target malignancies driven by KIT and

BLU-285: A potent and highly selective inhibitor designed to target malignancies driven by KIT and PDGFR mutations Erica Evans Ph.D. New Drugs on the Horizon 2017 AACR Annual Meeting April 2, 2017 Disclosures Employee and shareholder of

629 views • 25 slides

GIST: imatinib and beyond Clinical activity of BLU-285 in advanced gastrointestinal stromal tumor

GIST: imatinib and beyond Clinical activity of BLU-285 in advanced gastrointestinal stromal tumor (GIST) Michael Heinrich 1 , Robin Jones 2 , Margaret von Mehren 3 , Patrick Schoffski 4 , Sebastian Bauer 5 , Olivier Mir 6 , Philippe Cassier 7 ,

611 views • 19 slides

Particulate Air Quality Around Wisconsin Frac Sand Mines #285 B A Presentation by Dr. Crispin

Please attend this important meeting and learn more about what the air quality scientists are doing to protect those living near frac sand mines and other facilities! ......keeping watch on the industry Issue:#285 B Date: July 31, 2019 Particulate

62 views • 3 slides

Quality Candles ...in a modern design www.diana-candles.com 285 employees Aprox .

Quality Candles ...in a modern design www.diana-candles.com 285 employees Aprox . 30.000.000 turnover Three factories, two in Denmark and one in Latvia First choice for fmexibility and logistic in Scandinavia.

262 views • 4 slides

the public sector with Lorraine Forrest-Turner governmentevents.co.uk | 0330 0584 285 |

Copywriting in the public sector with Lorraine Forrest-Turner governmentevents.co.uk | 0330 0584 285 | info@governmentevents.co.uk What well be doing today 1. Determine your objective 2. Know your audience 3. Create one clear message 4.

1.42k views • 109 slides

Clinical activity in a Phase 1 study of BLU-285, a potent, highly-selective inhibitor of KIT D816V

Clinical activity in a Phase 1 study of BLU-285, a potent, highly-selective inhibitor of KIT D816V in advanced systemic mastocytosis Daniel J. DeAngelo, Albert T. Quiery, Deepti Radia, Mark W. Drummond, Jason Gotlib, William A. Robinson,

600 views • 20 slides

Visual disability Low vision 2015 Estimated blind people 2020 Visually impaired 285 M Blind

Visual disability Low vision 2015 Estimated blind people 2020 Visually impaired 285 M Blind 54 M Blind 39 M Global data souce: WHO, IBU See the world through the eyes of a visually impaired person Normal vision Cataract Glaucoma

562 views • 28 slides

Southern Companys Demonstration of a 285 MW Coal-Based Transport Gasifier Project Project

Southern Companys Demonstration of a 285 MW Coal-Based Transport Gasifier Project Project Presentation Clean Coal Power Initiative - Round 2 - Demonstration of Air-blown Integrated Gasification Combined Cycle (IGCC) Power Plant With

108 views • 9 slides

Georgia DOT Updates: MMIP and Transform 285/400 January 23, 2018 Tim Matthews, P.E. MMIP

Georgia DOT Updates: MMIP and Transform 285/400 January 23, 2018 Tim Matthews, P.E. MMIP Program Manager Jill Goldberg External Affairs Manager 1 We Will Cover Major Mobility Investment Program (MMIP) Georgia Express Lanes System

611 views • 42 slides

Lanes and I-285 Top End Express Lanes Fulton County Schools Briefing Tim Matthews, P.E.

SR 400 Express Lanes and I-285 Top End Express Lanes Fulton County Schools Briefing Tim Matthews, P.E. MMIP Program Manager October 22 , 2018 (Revised by GDOT on November 2, 2018) MMIP Project Locations FCS Facilities Impact - MMIP

630 views • 16 slides

COST OR PRICE COST OR PRICE REASONABLENESS REASONABLENESS (CPR) (CPR) UH APM A8.285 RCUH

COST OR PRICE COST OR PRICE REASONABLENESS REASONABLENESS (CPR) (CPR) UH APM A8.285 RCUH P&P 2.125 What is cost or price analysis? What is cost or price analysis? A required determination to insure that the price is reasonable and

246 views • 24 slides

Introduction to Intelligent Transportation Systems (ITS): I-285 Variable Speed Limits Andrew

Introduction to Intelligent Transportation Systems (ITS): I-285 Variable Speed Limits Andrew Hoenig, P.E. Inno novat ative e Delivery Georg orgia ia Departme tment nt of Transpor nsporta tation ion January 8, 2015 VSL System

391 views • 7 slides

Deep Unfolding of a Proximal Interior Point Method for Image Restoration M.-C. Corbineau 1 in

Introduction Proximal IP method Proximity operator of the barrier Proposed architecture Network stability Numerical experiments Conclusion Deep Unfolding of a Proximal Interior Point Method for Image Restoration M.-C. Corbineau 1 in

783 views • 59 slides

CENG4480 Lecture 07: PID Control Bei Yu byu@cse.cuhk.edu.hk (Latest update: October 10, 2018)

CENG4480 Lecture 07: PID Control Bei Yu byu@cse.cuhk.edu.hk (Latest update: October 10, 2018) Fall 2018 1 / 37 Overview Motors Open-loop and Closed-loop Control Control Methods Software 2 / 37 Overview Motors Open-loop and Closed-loop

620 views • 41 slides

GARCH models without positivity constraints: Exponential or Log GARCH ? C. Francq, O.

Probabilistic properties of the log-GARCH Estimating and testing the Log-GARCH Numerical illustrations GARCH models without positivity constraints: Exponential or Log GARCH ? C. Francq, O. Wintenberger and J-M. Zakoan CREST and Lille 3

1.68k views • 62 slides

PROMETHEE-compatible presentations of multicriteria evaluation tables Karim Lidouh , Anh Vu Doan

PROMETHEE-compatible presentations of multicriteria evaluation tables Karim Lidouh , Anh Vu Doan *, Yves De Smet CoDE-SMG, Universit e libre de Bruxelles 2nd International MCDA Workshop on PROMETHEE: Research and Case Studies 2nd

623 views • 25 slides

Proximal Policy Optimization Ruifan Yu (ruifan.yu@uwaterloo.ca) CS 885 June 20 Pro roximal l

Proximal Policy Optimization Ruifan Yu (ruifan.yu@uwaterloo.ca) CS 885 June 20 Pro roximal l Poli licy Optim timization (O (OpenAI) I) PPO has become th the default rein inforcement t le learnin ing alg lgorit ithm at t

2k views • 21 slides

Some applications of proximal methods Caroline CHAUX Joint work with P. L. Combettes, L. Duval,

G ENERAL CONTEXT P ROXIMAL TOOLS A PPLICATIONS C ONCLUSION Some applications of proximal methods Caroline CHAUX Joint work with P. L. Combettes, L. Duval, J.-C. Pesquet and N. Pustelnik LATP - UMR CNRS 7353, Aix-Marseille Universit e, France

894 views • 67 slides

Inertial Block Proximal Methods for Non-Convex Non-Smooth Optimization L. T. K. Hien 1 N. Gillis 1

Inertial Block Proximal Methods for Non-Convex Non-Smooth Optimization L. T. K. Hien 1 N. Gillis 1 P. Patrinos 2 1 University of Mons 2 KU Leuven The 37th International Conference on Machine Learning ICML 2020 1 / 44 Overview Problem set up 1

713 views • 44 slides

Meta-Learning of Structured Representation by Proximal Mapping Mao Li, Yingyi Ma, Xinhua

Meta-Learning of Structured Representation by Proximal Mapping Mao Li, Yingyi Ma, Xinhua Zhang University of Illinois at Chicago Motivation Goal of meta-learning: Extract prior structures from a set of tasks that allows efficient learning of

748 views • 15 slides