deep gaussian processes with importance weighted
play

Deep Gaussian Processes with Importance-Weighted Variational - PowerPoint PPT Presentation

Deep Gaussian Processes with Importance-Weighted Variational Inference Hugh Salimbeni Vincent Dutordoir, James Hensman, Marc P Deisenroth Problem setting Problem setting Bimodal density Problem setting Changes with input Problem setting


  1. Deep Gaussian Processes with Importance-Weighted Variational Inference Hugh Salimbeni Vincent Dutordoir, James Hensman, Marc P Deisenroth

  2. Problem setting

  3. Problem setting Bimodal density

  4. Problem setting Changes with input

  5. Problem setting Skewness

  6. Problem setting Skewness • Bus arrival times

  7. Problem setting Skewness • Bus arrival times • Confounding variables

  8. A possible approach f φ N y n x n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  9. A possible approach test samples training data f φ N y n x n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  10. A possible approach test samples Neural network training data f φ N y n x n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  11. A possible approach test samples Neural network training data f φ Latent variable N (per point) y n x n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  12. A possible approach test samples Neural network training data f φ Latent variable N (per point) y n x n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Concatenation with inputs

  13. A possible approach f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  14. A possible approach f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  15. A possible approach f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  16. A possible approach f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

  17. A possible approach f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Unreliable extrapolation

  18. A possible approach Overfitting f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Unreliable extrapolation

  19. A possible approach Overfitting Deterministic function f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Unreliable extrapolation

  20. A possible approach Overfitting Deterministic function f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Unreliable extrapolation Small number of examples per input x n

  21. Another possible approach ∞ f N y n x n w n y n = N ( f ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ, k )

  22. Another possible approach Non-parametric prior ∞ f N y n x n w n y n = N ( f ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ, k )

  23. Another possible approach Non-parametric prior ∞ f N y n x n w n y n = N ( f ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Better extrapolation f ∼ GP ( µ, k )

  24. Another possible approach Non-parametric prior ∞ f N y n x n w n y n = N ( f ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Better extrapolation f ∼ GP ( µ, k ) Underfitting

  25. Our model ∞ ∞ g f N x n y n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

  26. Our model ∞ ∞ g f N x n y n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

  27. Our model ∞ ∞ g f N x n y n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

  28. Our model Extrapolating gracefully ∞ ∞ g f N x n y n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

  29. Our model Extrapolating gracefully ∞ ∞ g f N x n y n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) Better data fit g ∼ GP ( µ 2 , k 2 )

  30. Contributions

  31. Contributions • New architecture - latent variables by concatenation, not addition

  32. Contributions • New architecture - latent variables by concatenation, not addition • Importance-weighted variational inference, exploiting analytic results

  33. Contributions • New architecture - latent variables by concatenation, not addition • Importance-weighted variational inference, exploiting analytic results • Provide an extensive empirical comparison with all 41 UCI regression datasets

  34. A few details ∞ ∞ g f N y n x n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

  35. A few details ∞ ∞ g f N y n x n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) Importance weighting w n ∼ N (0 , 1) (Gaussian proposal) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

  36. A few details ∞ ∞ g f N y n x n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) Importance weighting w n ∼ N (0 , 1) (Gaussian proposal) Variational inference f ∼ GP ( µ 1 , k 1 ) (sparse GP posterior) g ∼ GP ( µ 2 , k 2 )

  37. A few details ∞ ∞ g f N y n x n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) Importance weighting w n ∼ N (0 , 1) (Gaussian proposal) Variational inference f ∼ GP ( µ 1 , k 1 ) (sparse GP posterior) g ∼ GP ( µ 2 , k 2 ) Our approach exploits analytic results, leading to a tighter bound

  38. Results

  39. Results • Latent variables in the DGP are highly beneficial

  40. Results • Latent variables in the DGP are highly beneficial • Sometimes depth is enough. Sometimes latent variables are enough. Some datasets need both .

  41. Results • Latent variables in the DGP are highly beneficial • Sometimes depth is enough. Sometimes latent variables are enough. Some datasets need both . • Importance-weighted VI outperforms VI

  42. Results • Latent variables in the DGP are highly beneficial • Sometimes depth is enough. Sometimes latent variables are enough. Some datasets need both . • Importance-weighted VI outperforms VI

  43. Thanks for listening Poster #218 • New architecture • Importance-weighted • 41 datasets

Recommend


More recommend