> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Gaussian processes - Refresher and some more in insig ights Marcel Lüthi Graphics and Vision Research Group Department of Mathematics and Computer Science University of Basel
2 > DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Outline • Gaussian process – refresher • Vector-valued and scalar valued Gaussian processes • The space of samples • Gaussian process regression
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Gaussian process: Formal definition 𝑞 𝑣 = 𝐻𝑄 𝜈, 𝑙 A Gaussian process is a probability distribution over functions 𝑣 ∶ 𝒴 → ℝ 𝑒 such that every finite restriction to function values 𝑣 𝑌 = (𝑣 𝑦 1 , … , 𝑣 𝑦 𝑜 ) is a multivariate normal distribution 𝑞(𝑣 𝑌 ) = 𝑂 𝜈 𝑌 , 𝑙 𝑌𝑌 .
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Gaussian process: Illustration Restriction to values at points 𝑌 = {𝑦} 𝑦 𝑣 1 𝑦 𝑣 𝑦 = ∼ 𝑂 𝜈 𝑌 , 𝑙 𝑌𝑌 𝜈(𝑦) 𝑣 2 𝑦 𝜈 1 (𝑦) 𝜈 2 (𝑦) , 𝑙 11 (𝑦, 𝑦) 𝑙 12 (𝑦, 𝑦) = 𝑂 𝑙 21 (𝑦, 𝑦) 𝑙 22 (𝑦, 𝑦)
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Gaussian process: Illustration 𝑣(𝑦′) 𝑦 ′ Restriction to values at points 𝑌 = {𝑦, 𝑦′} 𝑣 1 𝑦 𝑦 𝑣(𝑦) 𝑣 2 (𝑦) = ∼ 𝑂 𝜈 𝑌 , 𝑙 𝑌𝑌 = 𝑣(𝑦) 𝑣(𝑦′) 𝑣 1 𝑦′ 𝑣 2 (𝑦′) 𝜈 1 (𝑦) k 11 (𝑦, 𝑦) k 12 (𝑦, 𝑦) k 11 (𝑦, 𝑦′) k 12 (𝑦, 𝑦′) 𝜈 2 (𝑦) k 21 (𝑦, 𝑦) k 22 (𝑦, 𝑦) k 21 (𝑦, 𝑦′) k 22 (𝑦, 𝑦′) 𝑂 , k 11 (𝑦 ′ , 𝑦) k 12 (𝑦 ′ , 𝑦) k 11 (𝑦 ′ , 𝑦′) k 12 (𝑦 ′ , 𝑦′) 𝜈 1 (𝑦′) k 21 (𝑦 ′ , 𝑦) k 22 (𝑦 ′ , 𝑦) k 21 (𝑦 ′ , 𝑦′) k 22 (𝑦 ′ , 𝑦′) 𝜈 2 (𝑦′)
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Defining a Gaussian process A Gaussian process 𝐻𝑄 𝜈, 𝑙 ex is completely specified by a mean function 𝜈 and covariance function (or kernel) 𝑙 . • 𝜈: 𝒴 → ℝ 𝑒 defines how the average deformation looks like • 𝑙: 𝒴 × 𝒴 → ℝ 𝑒×𝑒 defines how it can deviate from the mean • Must be positive semi-definite
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Marginalization property Let 𝑌 = 𝑦 1 , … , 𝑦 𝑜 and 𝑍 = 𝑧 1 , … , 𝑧 𝑛 𝜈 𝑌 𝜈 𝑍 , Σ 𝑌𝑌 Σ 𝑌𝑍 p 𝑌, 𝑍 = 𝑂 Σ 𝑍𝑌 Σ 𝑍𝑍 The marginal distribution 𝑞 𝑌 = ∫ 𝑞 𝑌, 𝑍 𝑒𝑍 is given by 𝑞 𝑌 = 𝑂 𝜈 𝑌 , Σ 𝑌𝑌 . • Evaluating the Gaussian process 𝐻𝑄 𝜈, 𝑙 defined on domain 𝒴 at the points 𝑌 = (𝑦 1 , … , 𝑦 𝑜 ) is marginalizing out (ignoring) all random variables 𝒴 \ 𝑌
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL From continuous to discrete Conceptual formulation: Continuous: 𝐻𝑄(𝜈, 𝑙) Practical implementation: Discrete: 𝑂(𝜈, 𝐿)
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL The Karhunen-Loève expansion u ∼ 𝐻𝑄 𝜈, 𝑙 We can write ∞ 𝛽 𝑗 𝑣 ∼ 𝜈 + σ 𝑗=1 𝜇 𝑗 𝜚 𝑗 , 𝛽 𝑗 ∼ 𝑂(0, 1) as • 𝜚 𝑗 is the eigenfunction with associated eigenvalue 𝜇 𝑗 of the linear operator [𝑈 𝑙 𝑣](𝑦) = ∫ 𝑙 𝑦, 𝑡 𝑣 𝑡 𝑒𝑡
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Eigenvalues and variance • Eigenvalue 𝜇 𝑗 • Interpretation: Variance of 𝛽 𝑗 𝜇 𝑗 𝜚 𝑗 • The total variance of the process 𝑣 ∼ 𝐻𝑄 𝜈, 𝑙 ∞ 𝜇 𝑗 . is given by σ 𝑗=1 • Observatio ion: Most variance is explained by the first eigenfunctions
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Low-rank approximation 𝑠 𝑣 = 𝜈 + 𝛽 𝑗 𝜇 𝑗 𝜚 𝑗 , 𝛽 𝑗 ∼ 𝑂(0, 1) 𝑗=1 Main idea: Represent process using only the first 𝑠 components • We have a finite, parametric representation of the process. • Any deformation 𝑣 is determined by the coefficients 𝛽 = 𝛽 1 , … , 𝛽 𝑠 𝑠 1 2 /2) 𝑞 𝑣 = 𝑞 𝛽 = ෑ exp(−𝛽 𝑗 2𝜌 𝑗=1
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Vector-valued and single valued Gaussian processes
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Scalar-valued Gaussian processes Vector-valu lued (th (this is cou ourse) Sc Scalar-valu lued (m (more common) • Samples u are deformation fields: • Samples f are real-valued functions 𝑔 ∶ ℝ 𝑜 → ℝ 𝑣: ℝ 𝑜 → ℝ 𝑒
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Scalar-valued Gaussian processes Vector-valu lued (th (this is cou ourse) Sc Scalar-valu lued (m (more common) 𝑣 ∼ 𝐻𝑄 Ԧ 𝜈, 𝒍 𝑔 ∼ 𝐻𝑄 𝜈, 𝑙 𝜈: 𝒴 → ℝ 𝑒 Ԧ 𝜈: 𝒴 → ℝ 𝒍: 𝒴 × 𝒴 → ℝ 𝑒×𝑒 𝑙: 𝒴 × 𝒴 → ℝ
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL A connection Matrix-valued kernels can be reinterpreted as scalar-valued kernels: Matrix valued kernel: 𝒍: 𝒴 × 𝒴 → ℝ 𝒆×𝒆 Scalar valued kernel: 𝑙: 𝒴 × 1. . 𝑒 × 𝒴 × 1. . 𝑒 → ℝ Bijection: : Define 𝑦 ′ , 𝑘 = 𝒍 𝑦 ′ , 𝑦 ′ 𝑗,𝑘 𝑙( 𝑦, 𝑗 ,
21 > DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL GP Regression – Vector-valued case 𝑙 11 𝑦 1 , 𝑦 1 𝑙 12 𝑦 1 , 𝑦 1 𝑙 11 𝑦 1 , 𝑦 𝑜 𝑙 12 𝑦 1 , 𝑦 𝑜 … 𝑙 21 𝑦 1 , 𝑦 1 𝑙 22 𝑦 1 , 𝑦 1 𝑙 21 𝑦 1 , 𝑦 𝑜 𝑙 22 𝑦 1 , 𝑦 𝑜 𝑳 = ⋮ ⋮ 𝑙 11 𝑦 𝑜 , 𝑦 1 𝑙 12 𝑦 𝑜 , 𝑦 1 𝑙 11 𝑦 𝑜 , 𝑦 𝑜 𝑙 12 𝑦 𝑜 , 𝑦 𝑜 … 𝑙 21 𝑦 𝑜 , 𝑦 1 𝑙 22 𝑦 𝑜 , 𝑦 1 𝑙 21 𝑦 𝑜 , 𝑦 𝑜 𝑙 22 𝑦 𝑜 , 𝑦 𝑜 𝑙 (𝑦 1 , 1), (𝑦 1 , 1) 𝑙 (𝑦 1 , 1), (𝑦 1 , 2) 𝑙 (𝑦 1 , 1), (𝑦 𝑜 , 1) 𝑙 (𝑦 1 , 1), (𝑦 𝑜 , 2) … 𝑙 𝑦 1 , 2 , (𝑦 1 , 1) 𝑙 𝑦 1 , 2 , (𝑦 1 , 2) 𝑙 𝑦 1 , 2 , (𝑦 𝑜 , 1) 𝑙 𝑦 1 , 2 , (𝑦 𝑜 , 2) 𝐿 = ⋮ ⋮ 𝑙 (𝑦 𝑜 , 1), (𝑦 1 , 1) 𝑙 (𝑦 𝑜 , 1), (𝑦 1 , 2) 𝑙 (𝑦 𝑜 , 1), (𝑦 𝑜 , 1) 𝑙 (𝑦 𝑜 , 1), (𝑦 𝑜 , 2) … 𝑙 𝑦 𝑜 , 2 , (𝑦 1 , 1) 𝑙 𝑦 𝑜 , 2 , (𝑦 1 , 2) 𝑙 𝑦 𝑜 , 2 , (𝑦 𝑜 , 1) 𝑙 𝑦 𝑜 , 2 , (𝑦 𝑜 , 2)
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL A connection Matrix-valued kernels can be reinterpreted as scalar-valued kernels: Matrix valued kernel: 𝒍: 𝒴 × 𝒴 → ℝ 𝒆×𝒆 Scalar valued kernel: 𝑙: 𝒴 × 1. . 𝑒 × 𝒴 × 1. . 𝑒 → ℝ Bijection: : Define 𝑦 ′ , 𝑘 = 𝒍 𝑦 ′ , 𝑦 ′ 𝑗,𝑘 𝑙( 𝑦, 𝑗 , All the theory developed for the scalar-valued GPs holds also for vector-valued GPs!
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Sampling revisited
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Finite views on infinite objects Infinite Gaussian process dimensional Continuous domain Finite dimensional Finite domain Finite rank (KL- (Marginalization) Expansion) 24
25 > DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL The space of samples Sampling from 𝐻𝑄 𝜈, 𝑙 is done using the corresponding normal distribution 𝑂( Ԧ 𝜈, K) Algorithm for sampling (slightly inefficient) Do an SVD: K = 𝑉𝐸 2 𝑉 𝑈 1. 2. Draw a normal vector 𝛽 ∼ 𝑂 0, 𝐽 𝑜×𝑜 3. Compute Ԧ 𝜈 + 𝑉𝐸𝛽
26 > DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL The space of samples • From K = 𝑉𝐸 2 𝑉 𝑈 (using that 𝑉 𝑈 𝑉 = 𝐽) we have that K𝑉𝐸 −1 = 𝑉𝐸 • Any sample 𝜈 + K𝑉𝐸 −1 𝛽 = 𝜈 + K𝛾 𝑡 = Ԧ 𝜈 + 𝑉𝐸𝛽 = Ԧ is a linear combinations of the columns of K . Two ways to represent sample: 𝜈 + σ 𝑗 𝑒 𝑗 𝛽 𝑗 𝑣 𝑗 1. KL-Expansion: 𝑡 = Ԧ 𝜈 + σ 𝑘 𝛾𝑙 𝑘 2. Linear combination of kernels: 𝑡 = Ԧ
27 > DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Four examples covariance functions 𝑙 𝑦, 𝑦 ′ = 𝑔 𝑦 𝑔 𝑦 ′ 3 𝑙 𝑦, 𝑦 ′ = 𝑗 (𝑦 ′ ) 𝑔 𝑗 𝑦 𝑔 f x = (1 − 𝑡 𝑦 )2𝑦 2 + 𝑡 𝑦 sin 𝑦 2 𝑗=1 3 𝑦 = cos(𝑦 2 ) 𝑔 1 𝑦 = sin 𝑦 , 𝑔 2 𝑦 = 𝑦, 𝑔
28 > DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Four examples covariance functions 𝑙 𝑦, 𝑦 ′ = 𝜀(𝑦, 𝑦 ′ ) 𝑙 𝑦, 𝑦 ′ = exp − 𝑦 − 𝑦 ′ 2 9
29 > DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Example 1 𝑙 𝑦, 𝑦 ′ = 𝑔 𝑦 𝑔(𝑦 ′ )
30 > DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Example 1 𝑙 𝑦, 𝑦 ′ = 𝑔 𝑦 𝑔(𝑦 ′ )
31 > DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 | BASEL Example 1 𝑙 𝑦, 𝑦 ′ = 𝑔 𝑦 𝑔(𝑦 ′ )
Recommend
More recommend