FUNCTION SPACE DISTRIBUTIONS OVER KERNELS GREG BENTON, WESLEY MADDOX, JAYSON SALKEY, JULIO ALBINATI, ANDREW GORDON WILSON
FUNCTIONAL KERNEL LEARNING ‣ Gaussian Process (GP): stochastic process HIGH LEVEL IDEA for which any finite collection of points is jointly normal ‣ a kernel function describing k ( x , x ′ � ) covariance y ( x ) ∼ GP ( μ ( x ), k ( x , x ′ � ))
FUNCTIONAL KERNEL LEARNING HIGH LEVEL IDEA y ( x ) ∼ GP ( μ ( x ), k ( x , x ′ � ))
FUNCTIONAL KERNEL LEARNING OUTLINE ‣ Introduction ‣ Mathematical Foundation ‣ Model Specification ‣ Inference Procedure
FUNCTIONAL KERNEL LEARNING OUTLINE ‣ Introduction ‣ Experimental Results ‣ Recovery of known kernels ‣ Interpolation and extrapolation of real data
FUNCTIONAL KERNEL LEARNING OUTLINE ‣ Introduction ‣ Experimental Results ‣ Extension to multi-task time-series ‣ Precipitation data
FUNCTIONAL KERNEL LEARNING BOCHNER’S THEOREM ‣ If then we can represent via its spectral density: k ( x , x ′ � ) = k ( τ ) k ( τ ) k ( τ ) = ∫ ℝ e 2 π i ωτ S ( ω ) d ω ‣ Learning the spectral representation of is sufficient to learn the k ( τ ) entire kernel
FUNCTIONAL KERNEL LEARNING BOCHNER’S THEOREM ‣ If then we can represent via its spectral density: k ( x , x ′ � ) = k ( τ ) k ( τ ) k ( τ ) = ∫ ℝ e 2 π i ωτ S ( ω ) d ω ‣ Learning the spectral representation of is sufficient to learn the k ( τ ) entire kernel ‣ Assuming is symmetric and data are finitely sampled, the k ( τ ) reconstruction simplifies to: k ( τ ) = ∫ [0, π / Δ ) cos(2 πτω ) S ( ω ) d ω
FUNCTIONAL KERNEL LEARNING FUNCTIONAL KERNEL LEARNING Graphical Model ω i Hyper-prior p ( ϕ ) = p ( θ , γ ) g i θ g ( ω ) | θ ∼ GP ( μ ( ω ; θ ), k g ( ω , ω ′ � ; θ ) ) Latent GP x n s i I Spectral Density S ( ω ) = exp{ g ( ω )} f n f ( x ) | S ( ω ), γ ∼ GP ( γ 0 , k ( τ ; S ( ω ))) γ Data GP y n N
FUNCTIONAL KERNEL LEARNING FUNCTIONAL KERNEL LEARNING Hyper-prior p ( ϕ ) = p ( θ , γ ) g ( ω ) | θ ∼ GP ( μ ( ω ; θ ), k g ( ω , ω ′ � ; θ ) ) Latent GP Spectral Density S ( ω ) = exp{ g ( ω )} f ( x ) | S ( ω ), γ ∼ GP ( γ 0 , k ( τ ; S ( ω )) + γ 1 δ τ =0 ) Data GP
FUNCTIONAL KERNEL LEARNING LATENT MODEL ‣ Mean of latent GP is log of RBF spectral density μ ( ω ; θ ) = θ 0 − ω 2 2 2 ˜ θ 1 ‣ Covariance is Matérn with ν = 1.5 ) K ν ( Γ ( ν ) ( ) + ˜ k g ( ω , ω ′ � ; θ ) = 2 1 − ν 2 ν | ω − ω ′ � | 2 ν | ω − ω ′ � | θ 3 δ τ =0 ˜ ˜ θ 2 θ 2 θ i = softmax ( θ i ) ˜
FUNCTIONAL KERNEL LEARNING INFERENCE ‣ Need to update the hyper parameters and the latent GP g ( ω ) ϕ ‣ Initialize to the log-periodogram of the data g ( ω ) ‣ Alternate: ‣ Fix and use Adam to update ϕ g ( ω ) ‣ Fix and use elliptical slice sampling to draw samples of ϕ g ( ω )
FUNCTIONAL KERNEL LEARNING OUTLINE ‣ Introduction ‣ Experimental Results ‣ Recovery of known kernels ‣ Interpolation and extrapolation of real data
FUNCTIONAL KERNEL LEARNING DATA FROM A SPECTRAL MIXTURE KERNEL ‣ Generative kernel has mixture of Gaussians as spectral density
FUNCTIONAL KERNEL LEARNING DATA FROM A SPECTRAL MIXTURE KERNEL ‣ Generative kernel has mixture of Gaussians as spectral density
FUNCTIONAL KERNEL LEARNING AIRLINE PASSENGER DATA
FUNCTIONAL KERNEL LEARNING OUTLINE ‣ Introduction ‣ Experimental Results ‣ Extension to multi-task time-series ‣ Precipitation data
FUNCTIONAL KERNEL LEARNING MULTIPLE TIME SERIES ‣ Can ‘link’ multiple time series by sharing the latent GP across outputs ‣ Let denote the realization of the latent GP and be the GP t th g t ( ω ) f t ( x ) over the time-series t th Hyper-prior p ( ϕ ) = p ( θ , γ ) g ( ω ) | θ ∼ GP ( μ ( ω ; θ ), k g ( ω , ω ′ � ; θ ) ) Latent GP S t ( ω ) = exp{ g t ( ω )} Spectral Density t th f t ( x ) | S ( ω ), γ ∼ GP ( γ 0 , k ( τ ; S t ( ω )) + γ 1 δ τ =0 ) GP for task t th
FUNCTIONAL KERNEL LEARNING MULTIPLE TIME SERIES ‣ Can ‘link’ multiple time series by sharing the latent GP across outputs ‣ Let denote the realization of the latent GP and be the GP t th g t ( ω ) f t ( x ) over the time-series t th Hyper-prior p ( ϕ ) = p ( θ , γ ) g ( ω ) | θ ∼ GP ( μ ( ω ; θ ), k g ( ω , ω ′ � ; θ ) ) Latent GP S t ( ω ) = exp{ g t ( ω )} Spectral Density t th f t ( x ) | S ( ω ), γ ∼ GP ( γ 0 , k ( τ ; S t ( ω )) + γ 1 δ τ =0 ) GP for task t th ‣ Test this on data from USHCN, daily precipitation values from continental US ‣ Inductive bias: yearly precipitation for climatologically similar regions should have similar covariance, similar spectral densities
FUNCTIONAL KERNEL LEARNING PRECIPITATION DATA Ran on two climatologically similar locations
FUNCTIONAL KERNEL LEARNING PRECIPITATION DATA Used 108 locations across the Northeast USA Each station, n = 300 Total: 300 * 108 = 32,400 data points 46 44 Latitude 42 40 − 80 − 75 − 70 Longitude Locations Used Here’s 48 of them…
FUNCTIONAL KERNEL LEARNING CONCLUSION ‣ FKL: Nonparametric, function-space view Link to Code of kernel learning ‣ Can express any stationary kernel with uncertainty representation ‣ GPyTorch Code: https://github.com/ wjmaddox/spectralgp
FUNCTIONAL KERNEL LEARNING CONCLUSION ‣ FKL: Nonparametric, function-space view Link to Code of kernel learning ‣ Can express any stationary kernel with uncertainty representation ‣ GPyTorch Code: https://github.com/ wjmaddox/spectralgp QUESTIONS? ‣ Poster 52
FUNCTIONAL KERNEL LEARNING REFERENCES Spectral Mixture Kernels: Wilson, Andrew, and Ryan Adams. "Gaussian process kernels for pattern discovery and extrapolation." International Conference on Machine Learning . 2013. BNSE: Tobar, Felipe. "Bayesian Nonparametric Spectral Estimation." Advances in Neural Information Processing Systems . 2018. Elliptical Slice Sampling: Murray, Iain, Ryan Adams, and David MacKay. "Elliptical slice sampling." Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics . 2010. GPyTorch: Gardner, Jacob, et al. "Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration." Advances in Neural Information Processing Systems. 2018.
FUNCTIONAL KERNEL LEARNING SINC DATA sinc ( x ) = sin( π x )/( π x )
FUNCTIONAL KERNEL LEARNING QUASI-PERIODIC DATA ‣ Generative kernel is product of RBF and periodic kernels
FUNCTIONAL KERNEL LEARNING QUASI-PERIODIC DATA ‣ Generative kernel is product of RBF and periodic kernels
FUNCTIONAL KERNEL LEARNING ELLIPTICAL SLICE SAMPLING (MURRAY, ADAMS, MACKAY, 2010) Sample zero mean Gaussians Re-parameterize for non-zero mean
Recommend
More recommend