ares and mars adversarial and mmd minimizing regression
play

AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs - PowerPoint PPT Presentation

AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs Gabriele Abbati *1 , Philippe Wenk *23 , Michael A Osborne 1 , Andreas Krause 2 , Bernhard Schlkopf 4 , Stefan Bauer 4 1 University of Oxford, 2 ETH Zrich, 3 Max Planck ETH


  1. AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs Gabriele Abbati *1 , Philippe Wenk *23 , Michael A Osborne 1 , Andreas Krause 2 , Bernhard Schölkopf 4 , Stefan Bauer 4 1 University of Oxford, 2 ETH Zürich, 3 Max Planck ETH Center for Learning Systems, 4 Max Planck Institute for Intelligent Systems Thirty-sixth International Conference on Machine Learning 1

  2. Stochastic Differential Equations in the Wild (a) Robotics (source: Athena robot, (b) Atmospheric Modeling (source: MPI-IS) wikipedia) (c) Stock Markets (source: Yahoo Finance) 2

  3. Gradient Matching { x = f ( x , θ ) Given f and y , ˙ ODE infer x and θ y = x + ϵ with ϵ ∼ N ( 0 , σ y ) { d x = f ( x , θ ) d t + G d W Given f and y , SDE infer x , G and θ y = x + ϵ with ϵ ∼ N ( 0 , σ y ) Integration-based methods Integration-free methods parameters → trajectory trajectory → parameters 3

  4. Classic Gradient Matching - Model (1) Gaussian Process prior on states φ σ p ( x | ϕ ) = N ( x | µ y , C φ ) p (˙ x | x , ϕ ) = N (˙ x | Dx , A ) y x x ˙ (2) ODE Model γ θ p (˙ x | x , θ , γ ) = N (˙ x | f ( x , θ ) , γ I ) x x ˙ 4

  5. Classic Gradient Matching - Inference Calderhead, Girolami, and Lawrence (2009) and Dondelinger et al. (2013) Product of Experts : p (˙ x ) ∝ p data (˙ x ) p ODE (˙ x ) Wenk et al. (2018), FGPGM Forced equality : p (˙ x ) ∝ p data (˙ x data ) p ODE (˙ x ODE ) δ (˙ x data − ˙ x ) δ (˙ x ODE − ˙ x ) Wenk*, Abbati* et al. (2019), ODIN ODEs as constraints 5

  6. Stochastic Differential Equations { d x = f ( x , θ ) d t + G d W General Given f and y , SDE Problem infer x , G and θ y = x + ϵ with ϵ ∼ N ( 0 , σ y ) 2 0 − 2 0 5 10 15 20 t { d x = θ 0 x ( θ 1 − x 2 ) d t + G d w Given f and y , Example infer x , G and θ y = x + ϵ with ϵ ∼ N ( 0 , σ y ) 6

  7. Stochastic Gradient Matching? Problems: 2 Both observation and process noise 0 Stochastic sample paths − 2 Paths are not differentiable 0 5 10 15 20 t 7

  8. The Doss-Sussmann Transformation { d x = f ( x , θ ) d t + G d W General Given f and y , SDE Problem infer x , G and θ y = x + ϵ with ϵ ∼ N ( 0 , σ y ) Definition (Ornstein-Uhlenbeck Process) A stochastic process o defined by the equation: d o = − o d t + G d W We introduce the latent variable z = x − o to get the stochastic gradients d z ( t ) = { f ( z ( t ) + o ( t ) , θ ) + o ( t ) } d t 8

  9. The Doss-Sussmann Transformation { d x = f ( x , θ ) d t + G d W General Given f and y , SDE Problem infer x , G and θ y = x + ϵ with ϵ ∼ N ( 0 , σ y ) Definition (Ornstein-Uhlenbeck Process) A stochastic process o defined by the equation: d o = − o d t + G d W We introduce the latent variable z = x − o to get the stochastic gradients d z ( t ) = { f ( z ( t ) + o ( t ) , θ ) + o ( t ) } d t 8

  10. The Doss-Sussmann Transformation { d x = f ( x , θ ) d t + G d W General Given f and y , SDE Problem infer x , G and θ y = x + ϵ with ϵ ∼ N ( 0 , σ y ) Definition (Ornstein-Uhlenbeck Process) A stochastic process o defined by the equation: d o = − o d t + G d W We introduce the latent variable z = x − o to get the stochastic gradients d z ( t ) = { f ( z ( t ) + o ( t ) , θ ) + o ( t ) } d t 8

  11. A Novel Generative Model Previous Generative Model New Generative Model Y X + E Y = Z + O + E = Resulting observation marginal distribution: y | ϕ , G , σ ) = N ( 0 , C φ + B Ω B T + T ) p (˜ Gaussian prior OU process Obs. noise 9

  12. A Novel Generative Model Previous Generative Model New Generative Model Y X + E Y = Z + O + E = Resulting observation marginal distribution: y | ϕ , G , σ ) = N ( 0 , C φ + B Ω B T + T ) p (˜ Gaussian prior OU process Obs. noise 9

  13. A Tale of Two Graphical Models SDE-based model Data-based model o o G G σ σ y y z z z z ˙ ˙ ϕ ϕ θ p (˙ z | o , z , θ ) = δ (˙ z − f ( z + o , θ ) − o ) p (˙ z | z , ϕ ) = N (˙ z | Dz , A ) p (˙ z | o , z , θ ) ∼ p (˙ z | z , ϕ ) Good θ estimate 10

  14. A Tale of Two Graphical Models SDE-based model Data-based model o o G G σ σ y y z z z z ˙ ˙ ϕ ϕ θ p (˙ z | o , z , θ ) = δ (˙ z − f ( z + o , θ ) − o ) p (˙ z | z , ϕ ) = N (˙ z | Dz , A ) p (˙ z | o , z , θ ) ∼ p (˙ z | z , ϕ ) Good θ estimate 10

  15. Sample-based Parameter Inference GP fit (ZOE noise model) Samples from p SDE Samples from p data z SDE ∼ p (˙ z | o , z , θ ) z data ∼ p (˙ z | z , ϕ ) ˙ ˙ Iterative Gradient-based optimization z SDE ∼ ˙ z data → ˙ ∑ M z ( i ) [ ] (1) AReS (WGAN) i =1 f ω (˙ 1 θ ← −∇ θ SDE ) M (2) MaRS (MMD) θ ← ∇ θ MMD 2 z SDE , ˙ z data ] u [˙ 11

  16. Sample-based Parameter Inference GP fit (ZOE noise model) Samples from p SDE Samples from p data z SDE ∼ p (˙ z | o , z , θ ) z data ∼ p (˙ z | z , ϕ ) ˙ ˙ Iterative Gradient-based optimization z SDE ∼ ˙ z data → ˙ ∑ M z ( i ) [ ] (1) AReS (WGAN) i =1 f ω (˙ 1 θ ← −∇ θ SDE ) M (2) MaRS (MMD) θ ← ∇ θ MMD 2 z SDE , ˙ z data ] u [˙ 11

  17. Sample-based Parameter Inference GP fit (ZOE noise model) Samples from p SDE Samples from p data z SDE ∼ p (˙ z | o , z , θ ) z data ∼ p (˙ z | z , ϕ ) ˙ ˙ Iterative Gradient-based optimization z SDE ∼ ˙ z data → ˙ ∑ M z ( i ) [ ] (1) AReS (WGAN) i =1 f ω (˙ 1 θ ← −∇ θ SDE ) M (2) MaRS (MMD) θ ← ∇ θ MMD 2 z SDE , ˙ z data ] u [˙ 11

  18. Sample-based Parameter Inference GP fit (ZOE noise model) Samples from p SDE Samples from p data z SDE ∼ p (˙ z | o , z , θ ) z data ∼ p (˙ z | z , ϕ ) ˙ ˙ Iterative Gradient-based optimization z SDE ∼ ˙ z data → ˙ ∑ M z ( i ) [ ] (1) AReS (WGAN) i =1 f ω (˙ 1 θ ← −∇ θ SDE ) M (2) MaRS (MMD) θ ← ∇ θ MMD 2 z SDE , ˙ z data ] u [˙ 11

  19. Samples during Training Data-based Data-based Model-based Model-based z ( t ) z ( t ) ˙ ˙ t t (a) Samples before training (b) Samples after training 12

  20. Experimental Results - Lotka Volterra LV, GT NPSDE ESGF AReS MaRS 2 . 00 ± 0 . 09 θ 0 = 2 1 . 58 ± 0 . 71 2 . 04 ± 0 . 09 2 . 36 ± 0 . 18 1 . 00 ± 0 . 04 θ 1 = 1 0 . 74 ± 0 . 31 1 . 02 ± 0 . 05 1 . 18 ± 0 . 9 3 . 97 ± 0 . 63 θ 2 = 4 2 . 26 ± 1 . 51 3 . 87 ± 0 . 59 3 . 70 ± 0 . 51 0 . 98 ± 0 . 18 θ 3 = 1 0 . 49 ± 0 . 35 0 . 96 ± 0 . 14 0 . 91 ± 0 . 14 H 1 , 1 = 0 . 05 / 0 . 03 ± 0 . 004 0 . 01 ± 0 . 03 H 1 , 2 = 0 . 03 / 0 . 02 ± 0 . 01 0 . 01 ± 0 . 01 H 2 , 1 = 0 . 03 / 0 . 02 ± 0 . 01 0 . 01 ± 0 . 01 H 2 , 2 = 0 . 09 / 0 . 09 ± 0 . 03 0 . 03 ± 0 . 02 8 d x 1 ( t ) [ θ 1 x 1 ( t ) − θ 2 x 1 ( t ) x 2 ( t )] d t = 6 + G 11 d w 1 ( t ) 4 d x 2 ( t ) [ − θ 3 x 2 ( t ) + θ 4 x 1 ( t ) x 2 ( t )] d t = 2 + G 21 d w 1 ( t ) + G 22 d w 2 ( t ) 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 t 13

  21. Experimental Results - Double Well Potential DW, GT NPSDE VGPA ESGF AReS MaRS 0 . 10 ± 0 . 05 θ 0 = 0 . 1 0 . 09 ± 7 . 00 0 . 05 ± 0 . 04 0 . 01 ± 0 . 03 0 . 09 ± 0 . 04 3 . 85 ± 1 . 10 θ 1 = 4 3 . 36 ± 248 . 82 1 . 11 ± 0 . 66 0 . 11 ± 0 . 16 3 . 68 ± 1 . 34 H = 0 . 25 / 0 . 21 ± 0 . 09 0 . 00 ± 0 . 02 0 . 20 ± 0 . 05 2 d x ( t ) = θ 0 x ( θ 1 − x 2 ) d t + G d w ( t ) 0 − 2 0 5 10 15 20 t 14

  22. Contributions We extend classical gradient matching to SDEs We introduce a novel statistical framework combining the Doss-Sussmann transformation and GPs We introduce a novel parameter inference scheme that leverages adversarial and moment matching loss functions We improve parameter inference accuracy in systems of SDEs 15

  23. Thank you Come and catch us → poster #216 Bonus Round: check out our paper on classic gradient matching! Wenk*, P ., Abbati*, G., Bauer, S., Osborne, M. A., Krause, A., Schölkopf, B. (2019). ODIN: ODE-Informed Regression for Parameter and State Inference in Time-Continuous Dynamical Systems . ArXiv Preprint ArXiv:1902.06278. 16

Recommend


More recommend