function space distributions over kernels
play

FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY - PowerPoint PPT Presentation

FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY MADDOX, JAYSON SALKEY, JULIO ALBINATI, ANDREW GORDON WILSON FUNCTIONAL KERNEL LEARNING Gaussian Process (GP): stochastic process HIGH LEVEL IDEA for which any finite collection


  1. FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY MADDOX, JAYSON SALKEY, JULIO ALBINATI, ANDREW GORDON WILSON

  2. FUNCTIONAL KERNEL LEARNING ‣ Gaussian Process (GP): stochastic process HIGH LEVEL IDEA for which any finite collection of points is jointly normal ‣ a kernel function describing k ( x , x ′ � ) covariance y ( x ) ∼ GP ( μ ( x ), k ( x , x ′ � ))

  3. FUNCTIONAL KERNEL LEARNING HIGH LEVEL IDEA y ( x ) ∼ GP ( μ ( x ), k ( x , x ′ � ))

  4. FUNCTIONAL KERNEL LEARNING OUTLINE ‣ Introduction ‣ Mathematical Foundation ‣ Model Specification ‣ Inference Procedure

  5. FUNCTIONAL KERNEL LEARNING OUTLINE ‣ Introduction ‣ Experimental Results ‣ Recovery of known kernels ‣ Interpolation and extrapolation of real data

  6. FUNCTIONAL KERNEL LEARNING OUTLINE ‣ Introduction ‣ Experimental Results ‣ Extension to multi-task time-series ‣ Precipitation data

  7. FUNCTIONAL KERNEL LEARNING BOCHNER’S THEOREM ‣ If then we can represent via its spectral density: k ( x , x ′ � ) = k ( τ ) k ( τ ) k ( τ ) = ∫ ℝ e 2 π i ωτ S ( ω ) d ω ‣ Learning the spectral representation of is sufficient to learn the k ( τ ) entire kernel

  8. FUNCTIONAL KERNEL LEARNING BOCHNER’S THEOREM ‣ If then we can represent via its spectral density: k ( x , x ′ � ) = k ( τ ) k ( τ ) k ( τ ) = ∫ ℝ e 2 π i ωτ S ( ω ) d ω ‣ Learning the spectral representation of is sufficient to learn the k ( τ ) entire kernel ‣ Assuming is symmetric and data are finitely sampled, the k ( τ ) reconstruction simplifies to: k ( τ ) = ∫ [0, π / Δ ) cos(2 πτω ) S ( ω ) d ω

  9. FUNCTIONAL KERNEL LEARNING FUNCTIONAL KERNEL LEARNING Graphical Model ω i Hyper-prior p ( ϕ ) = p ( θ , γ ) g i θ g ( ω ) | θ ∼ GP ( μ ( ω ; θ ), k g ( ω , ω ′ � ; θ ) ) Latent GP x n s i I Spectral Density S ( ω ) = exp{ g ( ω )} f n f ( x ) | S ( ω ), γ ∼ GP ( γ 0 , k ( τ ; S ( ω ))) γ Data GP y n N

  10. FUNCTIONAL KERNEL LEARNING FUNCTIONAL KERNEL LEARNING Hyper-prior p ( ϕ ) = p ( θ , γ ) g ( ω ) | θ ∼ GP ( μ ( ω ; θ ), k g ( ω , ω ′ � ; θ ) ) Latent GP Spectral Density S ( ω ) = exp{ g ( ω )} f ( x ) | S ( ω ), γ ∼ GP ( γ 0 , k ( τ ; S ( ω )) + γ 1 δ τ =0 ) Data GP

  11. FUNCTIONAL KERNEL LEARNING LATENT MODEL ‣ Mean of latent GP is log of RBF spectral density μ ( ω ; θ ) = θ 0 − ω 2 2 2 ˜ θ 1 ‣ Covariance is Matérn with ν = 1.5 ) K ν ( Γ ( ν ) ( ) + ˜ k g ( ω , ω ′ � ; θ ) = 2 1 − ν 2 ν | ω − ω ′ � | 2 ν | ω − ω ′ � | θ 3 δ τ =0 ˜ ˜ θ 2 θ 2 θ i = softmax ( θ i ) ˜

  12. FUNCTIONAL KERNEL LEARNING INFERENCE ‣ Need to update the hyper parameters and the latent GP g ( ω ) ϕ ‣ Initialize to the log-periodogram of the data g ( ω ) ‣ Alternate: ‣ Fix and use Adam to update ϕ g ( ω ) ‣ Fix and use elliptical slice sampling to draw samples of ϕ g ( ω )

  13. FUNCTIONAL KERNEL LEARNING OUTLINE ‣ Introduction ‣ Experimental Results ‣ Recovery of known kernels ‣ Interpolation and extrapolation of real data

  14. FUNCTIONAL KERNEL LEARNING DATA FROM A SPECTRAL MIXTURE KERNEL ‣ Generative kernel has mixture of Gaussians as spectral density

  15. FUNCTIONAL KERNEL LEARNING DATA FROM A SPECTRAL MIXTURE KERNEL ‣ Generative kernel has mixture of Gaussians as spectral density

  16. FUNCTIONAL KERNEL LEARNING AIRLINE PASSENGER DATA

  17. FUNCTIONAL KERNEL LEARNING OUTLINE ‣ Introduction ‣ Experimental Results ‣ Extension to multi-task time-series ‣ Precipitation data

  18. FUNCTIONAL KERNEL LEARNING MULTIPLE TIME SERIES ‣ Can ‘link’ multiple time series by sharing the latent GP across outputs ‣ Let denote the realization of the latent GP and be the GP t th g t ( ω ) f t ( x ) over the time-series t th Hyper-prior p ( ϕ ) = p ( θ , γ ) g ( ω ) | θ ∼ GP ( μ ( ω ; θ ), k g ( ω , ω ′ � ; θ ) ) Latent GP S t ( ω ) = exp{ g t ( ω )} Spectral Density t th f t ( x ) | S ( ω ), γ ∼ GP ( γ 0 , k ( τ ; S t ( ω )) + γ 1 δ τ =0 ) GP for task t th

  19. FUNCTIONAL KERNEL LEARNING MULTIPLE TIME SERIES ‣ Can ‘link’ multiple time series by sharing the latent GP across outputs ‣ Let denote the realization of the latent GP and be the GP t th g t ( ω ) f t ( x ) over the time-series t th Hyper-prior p ( ϕ ) = p ( θ , γ ) g ( ω ) | θ ∼ GP ( μ ( ω ; θ ), k g ( ω , ω ′ � ; θ ) ) Latent GP S t ( ω ) = exp{ g t ( ω )} Spectral Density t th f t ( x ) | S ( ω ), γ ∼ GP ( γ 0 , k ( τ ; S t ( ω )) + γ 1 δ τ =0 ) GP for task t th ‣ Test this on data from USHCN, daily precipitation values from continental US ‣ Inductive bias: yearly precipitation for climatologically similar regions should have similar covariance, similar spectral densities

  20. FUNCTIONAL KERNEL LEARNING PRECIPITATION DATA Ran on two climatologically similar locations

  21. FUNCTIONAL KERNEL LEARNING PRECIPITATION DATA Used 108 locations across the Northeast USA Each station, n = 300 Total: 300 * 108 = 32,400 data points 46 44 Latitude 42 40 − 80 − 75 − 70 Longitude Locations Used Here’s 48 of them…

  22. FUNCTIONAL KERNEL LEARNING CONCLUSION ‣ FKL: Nonparametric, function-space view Link to Code of kernel learning ‣ Can express any stationary kernel with uncertainty representation ‣ GPyTorch Code: https://github.com/ wjmaddox/spectralgp

  23. FUNCTIONAL KERNEL LEARNING CONCLUSION ‣ FKL: Nonparametric, function-space view Link to Code of kernel learning ‣ Can express any stationary kernel with uncertainty representation ‣ GPyTorch Code: https://github.com/ wjmaddox/spectralgp QUESTIONS? ‣ Poster 52

  24. FUNCTIONAL KERNEL LEARNING REFERENCES Spectral Mixture Kernels: Wilson, Andrew, and Ryan Adams. "Gaussian process kernels for pattern discovery and extrapolation." International Conference on Machine Learning . 2013. BNSE: Tobar, Felipe. "Bayesian Nonparametric Spectral Estimation." Advances in Neural Information Processing Systems . 2018. Elliptical Slice Sampling: Murray, Iain, Ryan Adams, and David MacKay. "Elliptical slice sampling." Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics . 2010. GPyTorch: Gardner, Jacob, et al. "Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration." Advances in Neural Information Processing Systems. 2018.

  25. FUNCTIONAL KERNEL LEARNING SINC DATA sinc ( x ) = sin( π x )/( π x )

  26. FUNCTIONAL KERNEL LEARNING QUASI-PERIODIC DATA ‣ Generative kernel is product of RBF and periodic kernels

  27. FUNCTIONAL KERNEL LEARNING QUASI-PERIODIC DATA ‣ Generative kernel is product of RBF and periodic kernels

  28. FUNCTIONAL KERNEL LEARNING ELLIPTICAL SLICE SAMPLING (MURRAY, ADAMS, MACKAY, 2010) Sample zero mean Gaussians Re-parameterize for non-zero mean

Recommend


More recommend