the differentiable cross entropy method
play

The Differentiable Cross-Entropy Method ICML 2020 Br Brandon Amos 1 - PowerPoint PPT Presentation

The Differentiable Cross-Entropy Method ICML 2020 Br Brandon Amos 1 De Denis is Yarats 12 12 1 Facebook AI Research 2 New York University brandondamos denisyarats bamos.github.io cs.nyu.edu/~dy1042 The cross-entropy method is a powerful


  1. The Differentiable Cross-Entropy Method ICML 2020 Br Brandon Amos 1 De Denis is Yarats 12 12 1 Facebook AI Research 2 New York University brandondamos denisyarats bamos.github.io cs.nyu.edu/~dy1042

  2. The cross-entropy method is a powerful optimizer It Iterative s sampling-ba based d optimizer that: ples from the domain 1. Sa Sampl Observes the function’s values 2. Ob Updates the sampling distribution 3. Up ol and mo Widely used in co control model-ba based d RL Brandon Amos The Differentiable Cross-Entropy Method 2

  3. Problem: CEM breaks end-to-end learning A common le learnin ing pip ipelin line , e.g. for control, is 1. Fit models with maximum like kelihood 2. Ru Run CE CEM on top of the learned models 3. Hope CEM induces re reasonable downstre ream performance Ob Obje jectiv ive mis ismatch is issue : models are unaware of downstream performance Control Interacts Dynamics ! Policy # " (%) Environment " Responses Trai aining: g: Maximum Likelihood Objective Mismat atch Trajectories State Transitions Reward Brandon Amos The Differentiable Cross-Entropy Method 3

  4. The Differentiable Cross-Entropy Method (DCEM) Differentiate backw kwards through the sequence of samples ble top-k (LML) and re Using di differentiabl reparameterization Useful when a fixed point is ha hard to o find nd, or when unrolling gradient descent hits a local optimum A di differentiabl ble controller in the RL setting Brandon Amos The Differentiable Cross-Entropy Method 4

  5. This Talk Me Meth thod: The differenti tiable-cr cross s en entrop opy y met ethod Applications Learning deep energy-based models Learning embedded optimizers Control Brandon Amos The Differentiable Cross-Entropy Method 5

  6. Foundation: The Implicit Function Theorem [Dini 1877, Dontchev and Rockafellar 2009] Given 𝑕(𝑦, 𝑧) and 𝑔 𝑦 = 𝑕 𝑦, 𝑧 ! , where 𝑧 ! ∈ {𝑧: 𝑕 𝑦, 𝑧 = 0} D " 𝑕(𝑦, 𝑔 𝑦 ) How can we compute D " 𝑔 𝑦 ? The Im Implicit F Function T Theorem gives $% D " 𝑕 𝑦, 𝑔 𝑦 D " 𝑔 𝑦 = βˆ’D # 𝑕 𝑦, 𝑔 𝑦 D # 𝑕(𝑦, 𝑔 𝑦 ) under mild assumptions Brandon Amos The Differentiable Cross-Entropy Method 6

  7. Foundation: Differentiable top-k operations [Constrained softmax, constrained sparsemax, Limited Multi-Label Projection] Optimization perspective of the softmax Limited Multi-Label Projection 𝑧 ⋆ = 𝑧 ⋆ = βˆ’π‘§ ' 𝑦 βˆ’ 𝐼 ( (𝑧) βˆ’π‘§ ' 𝑦 βˆ’ 𝐼(𝑧) argmin argmin # # subject to 0 ≀ 𝑧 ≀ 1 subject to 0 ≀ 𝑧 ≀ 1 1 ' 𝑧 = 1 1 ' 𝑧 = 𝑙 Brandon Amos The Differentiable Cross-Entropy Method 7

  8. The Differentiable Cross-Entropy Method 𝑔 In each each iter erat ation on , update a distribution 𝑕 ) with: / 𝑕 ) ' 𝑕 ) & . π‘Œ *,, ,-% ∼ 𝑕 ) ! β‹… Sa Sampl ple from the domain 𝑕 ) % 𝑀 *,, = 𝑔 / π‘Œ *,, Ob Observe the function values ℐ * = Ξ  β„’ " (𝑀 * /𝜐) Compute the differentiable to top-k Up Update 𝜚 *1% with maximum weighted likelihood And finally return 𝔽[𝑕 ) #$% β‹… ] Ca Captures vanilla illa CE CEM when the soft top-k is hard Composed of operations with in informativ ive deriv ivativ ives Brandon Amos The Differentiable Cross-Entropy Method 8

  9. This Talk Method: The differentiable-cross entropy method Appl Applications Le Learni ning ng deep ene nergy gy-ba based d mode dels Learning embedded optimizers Control Brandon Amos The Differentiable Cross-Entropy Method 9

  10. Deep Structured Energy Models (SPENs/ICNNs) [Belanger and McCallum, 2016, Amos, Xu, and Kolter, 2017] Key idea: Model p 𝑦, 𝑧 ∝ exp βˆ’πΉ / 𝑦, 𝑧 Ke where 𝐹 / is a deep energy model tures in the output space, while also subsuming feed-forward modes Captures non non-tri trivial stru tructu 2 Feedforward model: 𝐹 𝑦, 𝑧 = 𝑔 𝑦 βˆ’ 𝑧 2 dict with the optimization problem: Pr Predi 𝑧 = argmin R 𝐹 / (𝑦, 𝑧) # ng can be done by unr on on 𝐹 / using derivative information βˆ‡ # 𝐹 Le Learni ning unrol olling ng op optimization Brandon Amos The Differentiable Cross-Entropy Method 10

  11. Unrolling gradient descent may learn bad energies Unrolling optimizers lo lose the probabilis ilistic ic in interpretatio ion and can ov overfit to o the he op optimizer iers on the energy surface while DCE In this regression setting, GD GD le learns barrie DCEM fit its the data Brandon Amos The Differentiable Cross-Entropy Method 11

  12. This Talk Method: The differentiable-cross entropy method Appl Applications Learning deep energy-based models Le Learni ning ng embedded op optimizers Control Brandon Amos The Differentiable Cross-Entropy Method 12

  13. DCEM can exploit the solution space structure 𝑦 ⋆ = argmin "∈ $,& ! 𝑔 𝑦 Ful Full Domain Ma Manifold of Op Opti timal Soluti tions La Latent nt Manif nifold ld of of Optimal Sol olution ons Brandon Amos The Differentiable Cross-Entropy Method 13

  14. This Talk Method: The differentiable-cross entropy method Appl Applications Learning deep energy-based models Learning embedded optimizers Co Control Brandon Amos The Differentiable Cross-Entropy Method 14

  15. Should RL policies have a system dynamics model or not? Policy Po cy Neural Network(s) State Action System Future Dynamics Plan Mo Model-fr free R RL More general, doesn’t make as many assumptions about the world Rife with poor data efficiency and learning stability issues Model-ba Mo based d RL (or control) A useful prior on the world if it lies within your set of assumptions Brandon Amos Differentiable Optimization-Based Modeling and Continuous Control 15

  16. Model Predictive Control Kn Known or le learned from data Brandon Amos Differentiable Optimization-Based Modeling and Continuous Control 16

  17. Differentiable Control via DCEM A pure pl planning pr prob oblem given (potentially non-convex) co cost and dy dynamics : ⋆ 𝜐 %:4 = argmin [ 𝐷 / (𝜐 * ) Cost 5 %:# * subject to 𝑦 % = 𝑦init 𝑦 *1% = 𝑔 / 𝜐 * Dynamics 𝑣 ≀ 𝑣 ≀ 𝑣 where 𝜐 ! = {𝑦 ! , 𝑣 ! } Ide Idea: Solve this optimization problem with DCEM and differentiate through it Brandon Amos The Differentiable Cross-Entropy Method 17

  18. Differentiable Control via DCEM A lot of data Model Predictions Loss … … Layer z " DCEM What can we do with this now? Augment neural network k policies in model-free algorithms with MPC policies Fi Fight ght ob objective mism smatch by end-to-end learning dynamics The The cost ost can also be end-to-end learned! No longer need to hard-code in values Ca Caveat: Control problems are often intractably high-dimensional, so we use embedded DCEM Brandon Amos The Differentiable Cross-Entropy Method 18

  19. DCEM fine-tunes highly non-convex controllers sites.google.com/view/diff-cross-entropy-method Brandon Amos The Differentiable Cross-Entropy Method 19

  20. The Differentiable Cross-Entropy Method ICML 2020 Br Brandon Amos 1 De Denis is Yarats 12 12 1 Facebook AI Research 2 New York University brandondamos denisyarats bamos.github.io cs.nyu.edu/~dy1042

Recommend


More recommend