Dimension-Wise Importance Sampling Weight Clipping for - PowerPoint PPT Presentation

� Han and Sung, ICML 2019 c 1 Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning Seungyul Han and Youngchul Sung Dept. of Electrical Engineering KAIST ICML 2019, Long Beach, CA, USA Jun. 12, 2019

� Han and Sung, ICML 2019 c 2 Contributions • Proximal policy optimization [Schulman et al., 2017] : A stable on-policy RL algorithm. • Limitations of PPO – PPO has vanishing gradient problem in high dimensional tasks. – On-policy learning of PPO is sample-inefficient. • To overcome these drawbacks, we propose 1. Dimension-wise importance sampling weight clipping (DISC) : Solve the vanishing gradient problem. 2. Off-policy generalization : Reuse old samples to enhance the sample-efficiency.

� Han and Sung, ICML 2019 c 3 Proximal Policy Optimization (PPO) • PPO updates the policy parameter θ to maximize importance weighted advantage: M − 1 J PPO ( θ ) = 1 ˆ � min { ρ m ˆ A m , clip ǫ ( ρ m ) ˆ A m } M m =0 M − 1 = 1 � min { κ m ρ m , κ m clip ǫ ( ρ m ) } κ m ˆ A m (1) M m =0 – where ρ m = π θ ( a m | s m ) π θi ( a m | s m ) is importance sampling (IS) weight, – ˆ A m is estimated by generalized advantage estimation (GAE) [Schulman et al., 2015], – and clip ǫ ( · ) = clip( · , 1 − ǫ, 1 + ǫ ) , κ m = sgn ( ˆ A m ) . • PPO updates θ when the IS weight is not clipped. • Otherwise, it does not update θ . • Clipped IS weight enables stable policy update.

� Han and Sung, ICML 2019 c 4 The Vanishing Gradient Problem • The gradient of clipped samples becomes zero and it reduces sample-efficiency. • Larger ρ ′ t := | 1 − ρ t | + 1 makes more zero-gradient samples. • For higher dimensional tasks, ρ ′ t is much larger than lower dimensional tasks. Figure 1: Average ρ ′ t (left) and the amount of gradient vanishing (right)

� Han and Sung, ICML 2019 c 5 Dimension-Wise Clipping π θ ( a t,d | s t ) • Clip dimension-wise IS weight : ρ t,d := π θi ( a t,d | s t ) instead of total IS weight ρ t . m =0 (log( ρ m )) 2 which enables stable learning. � M − 1 1 • Add IS weight loss : J IS = 2 M • DISC updates θ to maximize dimension-wise importance weighted advantage : M − 1 � D − 1 � J DISC = 1 ˆ � � κ m ˆ min { κ m ρ t,d , κ m clip ǫ ( ρ t,d ) A m − α IS J IS , (2) M m =0 d =0 where α IS is an adaptive coefficient. • Even if dimension-wise IS weight is clipped for some dimensions, DISC has other dimensions that are not clipped. • The policy is updated to the gradient of unclipped dimensions. ⇒ Hence, the sample gradient of DISC does not vanish in most samples!

� Han and Sung, ICML 2019 c 6 Off-Policy Generalization • We want to reuse the previous batches to enhance sample-efficiency further. • DISC reuses old batches that satisfies ρ ′ t,d < 1 + ǫ b to avoid too much clipping *. • IS calibration to estimate the advantage of the old samples is needed. • We combine GAE and V-trace [Espeholt et al., 2018] (GAE-V) to calibrate IS. Figure 2: The number of reused sample batches * Seungyul Han and Youngchul Sung, ”AMBER: Adaptive Multi-Batch Experience Replay for Continuous Action Control,” arXiv, Oct. 2018. https://arxiv.org/abs/1710.04423

� Han and Sung, ICML 2019 c 7 Evaluation • Evaluation on Mujoco [Todorov et al., 2012] tasks in OpenAI GYM [Brockman et al., 2016]. Figure 3: Mujoco continuous control tasks Comparison with PPO baselines Figure 4: Performance: Action dimension - Ant : 8, Humanoid : 17, HumanoidStandup : 17.

� Han and Sung, ICML 2019 c 8 Evaluation Comparison with state-of-the-art RL algorithms • DDPG[Lillicrap et al.,2015], TRPO[Schulman et al.,2015], ACKTR[Wu et al.,2017], Trust-PCL[Nachum et al.,2017], SQL[Haarnoja et al.,2017], TD3[Fujimoto et al., 2018], SAC[Haarnoja et al.,2018]. • DISC has top-level performance in 5 tasks out of the 6 considered tasks. • For HumanoidStandup, DISC has much higher performance than other algorithms. Figure 5: Max average return of DISC and other RL algorithms

� Han and Sung, ICML 2019 c 9 Conclusion • DISC extends PPO by dimension-wise IS clipping and off-policy generalization. • DISC solves the vanishing gradient problem and enhances sample-efficiency. • DISC achieves top-level performance as compared to other state-of-the-art RL algorithms.

Thank you ! Poster Session : Jun. 12. (Wed), Pacific Ballroom #35

Dimension-Wise Importance Sampling Weight Clipping for - PowerPoint PPT Presentation

Han and Sung, ICML 2019 c 1 Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning Seungyul Han and Youngchul Sung Dept. of Electrical Engineering KAIST ICML 2019, Long Beach, CA, USA Jun. 12,

The intrinsic dimension of importance sampling Omiros Papaspiliopoulos www.econ.upf.edu/~omiros

1 3 D Clipping 3 D Clipping Represent edge param et rically as A + ( C A)t

= y < ymax y > ymin ymax interior X max X min ymin AND Ymin y Ymax

Jay : Seaman Iris Last Lecture : Importance Sampling Xsnqcx Generate from Idea samples )

1 Clipping Points Against a View Volume Remember the Plane Equation? Given a point A on the

Zhang Last Lecture MCMC Importance Sampling : vs . = ply ) X ) j(x7/Z Cx ) ply 2- )

Zhang Last Lecture MCMC Importance Sampling : vs . = ply ) X ) Cx ) ply 2- ) y ( -17

Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain

Clipping http://www.ugrad.cs.ubc.ca/~cs314/Vjan2013 Reading for Clipping FCG Sec 8.1.3-8.1.6

Reading for Clipping CPSC 314 Computer Graphics FCG Sec 8.1.3-8.1.6 Clipping Jan-Apr 2013

Sequential Importance Sampling for Rare Event Estimation with Computer Experiments Brian

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

MC Ray Tracing: Part II, Importance Sampling Sung-Eui Yoon ( ) Course URL:

Visibility, Culling, Clipping (05) RNDr. Martin Madaras, PhD. martin.madaras@stuba.sk Overview

Computer graphics III Multiple Importance Sampling Jaroslav Kivnek, MFF UK

SDEs in large dimension and numerical methods Part 1: Sampling the canonical distribution T.

Product Importance Sampling for Light Transport Guiding Herholtz et al. 2016 presenter: Eunhyouk

CS-184: Computer Graphics Lecture #10: Clipping and Hidden Surfaces Prof. James OBrien

Reading for Clipping Rendering Pipeline CPSC 314 Computer Graphics FCG Sec 8.1.3-8.1.6

Neural Importance Sampling Fabrice Rousselle Markus Gross Jan Novk A ffi liation: Work done

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

SDEs in large dimension and numerical methods Part 2: Sampling metastable dynamics T. Lelivre

Computer Graphics - Clipping - Philipp Slusallek Clipping Motivation Projected primitive

Computer graphics III Multiple Importance Sampling Jaroslav Kivnek, MFF UK

Dimension-Wise Importance Sampling Weight Clipping for - PowerPoint PPT Presentation

Han and Sung, ICML 2019 c 1 Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning Seungyul Han and Youngchul Sung Dept. of Electrical Engineering KAIST ICML 2019, Long Beach, CA, USA Jun. 12,

The intrinsic dimension of importance sampling Omiros Papaspiliopoulos www.econ.upf.edu/~omiros

1 3 D Clipping 3 D Clipping Represent edge param et rically as A + ( C A)t

= y &lt; ymax y &gt; ymin ymax interior X max X min ymin AND Ymin y Ymax

Jay : Seaman Iris Last Lecture : Importance Sampling Xsnqcx Generate from Idea samples )

1 Clipping Points Against a View Volume Remember the Plane Equation? Given a point A on the

Zhang Last Lecture MCMC Importance Sampling : vs . = ply ) X ) j(x7/Z Cx ) ply 2- )

Zhang Last Lecture MCMC Importance Sampling : vs . = ply ) X ) Cx ) ply 2- ) y ( -17

Overview 1. Probabilistic Reasoning/Graphical models 2. Importance Sampling 3. Markov Chain

Clipping http://www.ugrad.cs.ubc.ca/~cs314/Vjan2013 Reading for Clipping FCG Sec 8.1.3-8.1.6

Reading for Clipping CPSC 314 Computer Graphics FCG Sec 8.1.3-8.1.6 Clipping Jan-Apr 2013

Sequential Importance Sampling for Rare Event Estimation with Computer Experiments Brian

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

MC Ray Tracing: Part II, Importance Sampling Sung-Eui Yoon ( ) Course URL:

Visibility, Culling, Clipping (05) RNDr. Martin Madaras, PhD. martin.madaras@stuba.sk Overview

Computer graphics III Multiple Importance Sampling Jaroslav Kivnek, MFF UK

SDEs in large dimension and numerical methods Part 1: Sampling the canonical distribution T.

Product Importance Sampling for Light Transport Guiding Herholtz et al. 2016 presenter: Eunhyouk

CS-184: Computer Graphics Lecture #10: Clipping and Hidden Surfaces Prof. James OBrien

Reading for Clipping Rendering Pipeline CPSC 314 Computer Graphics FCG Sec 8.1.3-8.1.6

Neural Importance Sampling Fabrice Rousselle Markus Gross Jan Novk A ffi liation: Work done

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

SDEs in large dimension and numerical methods Part 2: Sampling metastable dynamics T. Lelivre

Computer Graphics - Clipping - Philipp Slusallek Clipping Motivation Projected primitive

Computer graphics III Multiple Importance Sampling Jaroslav Kivnek, MFF UK

= y < ymax y > ymin ymax interior X max X min ymin AND Ymin y Ymax