gaussian processes
play

Gaussian Processes Seung-Hoon Na Chonbuk National University - PowerPoint PPT Presentation

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression Predictions using noisy observations The case of a single test input: Gaussian Process Regression Computational and numerical issues It is


  1. Gaussian Processes Seung-Hoon Na Chonbuk National University

  2. Gaussian Process Regression • Predictions using noisy observations – The case of a single test input:

  3. Gaussian Process Regression • Computational and numerical issues – It is unwise to directly invert – Instead, we use a Cholesky decomposition Marginal probability

  4. Gaussian Process Regression 𝑼 𝑳 𝒛 −𝟐 𝒍 ∗ • 𝑔 𝑦 ∗ = 𝑙 ∗∗ − 𝒍 ∗ = 𝑴𝑴 𝑈 • 𝑑ℎ𝑝𝑚𝑓𝑡𝑙𝑧 𝑳 𝑧 −1 𝒍 ∗ = 𝑴 −1 𝒍 ∗ 𝑈 𝑴 −1 𝒍 ∗ 𝑈 𝑳 𝑧 • 𝒍 ∗ • 𝒘 = 𝑴 −1 𝒍 ∗ = 𝑴 \ 𝒍 ∗

  5. Cholesky Decomposition • The Cholesky decomposition(CD) – The CD of a symmetric, positive definite matrix A decomposes A into a product of a lower triangular matrix L and its transpose • Solving linear system using CD: – To solve – We have two steps: • Computing the determinant of a matrix

  6. Gaussian Process Classification • The main difficulty is that the Gaussian prior is not conjugate to the bernoulli/ multinoulli likelihood  several approximations are available – Gaussian approximation – Expectation propagation (Kuss and Rasmussen 2005; Nickisch and Rasmussen 2008) – Variational inference (Girolami and Rogers 2006; Opper and Archambeau 2009) – MCMC (Neal 1997; Christensen et al. 2006)

  7. Gaussian Process Classification binary classification – Logistic regression: – Probit regression: – 𝑔 : GP regression

  8. Gaussian Process Classification • Define the log of the unnormalized posterior

  9. Gaussian Process Classification • Formula for

  10. Gaussian Process Classification • Use IRLS to find the MAP estimate • At convergence, the Gaussian approximation of the posterior:

  11. Gaussian Process Classification • Computing the posterior predictive • The predictive mean :

  12. Gaussian Process Classification • The predictive variance : – Use the law of total variance https://www.macroeconomics.tu- berlin.de/fileadmin/fg124/financial_crises/exercise/Variances.pdf

  13. Gaussian Process Classification • The predictive variance : Matrix inversion lemma

  14. Matrix inversion lemma • Consider a general partitioned matrix where we assume 𝑭 and 𝑰 are invertiable

  15. Gaussian Process Classification • Convert to a predictive distribution for binary responses • This can be approximated using – Monte Carlo approximation – Probit approximation – …

  16. Gaussian Process Classification • Marginal likelihood – Used to optimize the kernel parameters – Applying the Laplace approximation, we have: • Computing the derivatives – Now, since ෠ 𝒈 , 𝑿 , as well as K , depend on 𝜾 • More complex than in the regression case

  17. Gaussian Process Classification • Laplace approximation to the marginal likelihood.

  18. Gaussian Process Classification • Numerically stable computation – To avoid inverting K or W , introduce using B: • B : has eigenvalues bounded below by 1 and can be safely inverted – Applying the matrix inversion lemma, we have: – The IRLS update, now:

  19. Gaussian Process Classification • Numerically stable computation – At convergence, we have: – The log-marginal likelihood is: where we exploited:

  20. Gaussian Process Classification • Numerically stable computation – Compute the predictive distr. – Here, at the mode, – Thus, the predictive mean : – Also, we use: – Thus, the predictive varianc e:

  21. Gaussian Process Classification

  22. Gaussian Process Classification The posterior predictive probability for the red circle class generated by a GP with an SE kernel. Thick black line is the decision boundary if we threshold at a probability of 0.5. – Manual parameters, short length scale.

  23. Gaussian Process Classification • Learned parameters, long length scale

  24. Gaussian Process Classification: Multi-class classification – Again, we will use a Gaussian approximation to the posterior • 1) Use IRLS to compute the mode • 2) Apply the Gaussian approximation at the mode

  25. Gaussian Process Classification: Multi-class classification – The unnormalized log posterior: – 𝒛 : a dummy encoding of 𝑧 𝑗 ’s with the same layout as 𝒈 – 𝑳 : a block diagonal matrix containing 𝑳 𝑑

  26. Gaussian Process Classification: Multi-class classification – Use IRLS to compute the mode

  27. Gaussian Process Classification: Multi-class classification • The posterior predictive:

  28. Gaussian Process Classification: Multi-class classification – The covariance of the latent response • Computer the posterior predictive for the visible response:

  29. Gaussian Process Classification: Multi-class classification • Computing the marginal likelihood – similar to the binary case

Recommend


More recommend