Gaussian Processes Seung-Hoon Na Chonbuk National University

Gaussian Process Regression • Predictions using noisy observations – The case of a single test input:

Gaussian Process Regression • Computational and numerical issues – It is unwise to directly invert – Instead, we use a Cholesky decomposition Marginal probability

Gaussian Process Regression 𝑼 𝑳 𝒛 −𝟐 𝒍 ∗ • 𝑔 𝑦 ∗ = 𝑙 ∗∗ − 𝒍 ∗ = 𝑴𝑴 𝑈 • 𝑑ℎ𝑝𝑚𝑓𝑡𝑙𝑧 𝑳 𝑧 −1 𝒍 ∗ = 𝑴 −1 𝒍 ∗ 𝑈 𝑴 −1 𝒍 ∗ 𝑈 𝑳 𝑧 • 𝒍 ∗ • 𝒘 = 𝑴 −1 𝒍 ∗ = 𝑴 \ 𝒍 ∗

Cholesky Decomposition • The Cholesky decomposition(CD) – The CD of a symmetric, positive definite matrix A decomposes A into a product of a lower triangular matrix L and its transpose • Solving linear system using CD: – To solve – We have two steps: • Computing the determinant of a matrix

Gaussian Process Classification • The main difficulty is that the Gaussian prior is not conjugate to the bernoulli/ multinoulli likelihood  several approximations are available – Gaussian approximation – Expectation propagation (Kuss and Rasmussen 2005; Nickisch and Rasmussen 2008) – Variational inference (Girolami and Rogers 2006; Opper and Archambeau 2009) – MCMC (Neal 1997; Christensen et al. 2006)

Gaussian Process Classification binary classification – Logistic regression: – Probit regression: – 𝑔 : GP regression

Gaussian Process Classification • Define the log of the unnormalized posterior

Gaussian Process Classification • Formula for

Gaussian Process Classification • Use IRLS to find the MAP estimate • At convergence, the Gaussian approximation of the posterior:

Gaussian Process Classification • Computing the posterior predictive • The predictive mean :

Gaussian Process Classification • The predictive variance : – Use the law of total variance https://www.macroeconomics.tu- berlin.de/fileadmin/fg124/financial_crises/exercise/Variances.pdf

Gaussian Process Classification • The predictive variance : Matrix inversion lemma

Matrix inversion lemma • Consider a general partitioned matrix where we assume 𝑭 and 𝑰 are invertiable

Gaussian Process Classification • Convert to a predictive distribution for binary responses • This can be approximated using – Monte Carlo approximation – Probit approximation – …

Gaussian Process Classification • Marginal likelihood – Used to optimize the kernel parameters – Applying the Laplace approximation, we have: • Computing the derivatives – Now, since ෠ 𝒈 , 𝑿 , as well as K , depend on 𝜾 • More complex than in the regression case

Gaussian Process Classification • Laplace approximation to the marginal likelihood.

Gaussian Process Classification • Numerically stable computation – To avoid inverting K or W , introduce using B: • B : has eigenvalues bounded below by 1 and can be safely inverted – Applying the matrix inversion lemma, we have: – The IRLS update, now:

Gaussian Process Classification • Numerically stable computation – At convergence, we have: – The log-marginal likelihood is: where we exploited:

Gaussian Process Classification • Numerically stable computation – Compute the predictive distr. – Here, at the mode, – Thus, the predictive mean : – Also, we use: – Thus, the predictive varianc e:

Gaussian Process Classification

Gaussian Process Classification The posterior predictive probability for the red circle class generated by a GP with an SE kernel. Thick black line is the decision boundary if we threshold at a probability of 0.5. – Manual parameters, short length scale.

Gaussian Process Classification • Learned parameters, long length scale

Gaussian Process Classification: Multi-class classification – Again, we will use a Gaussian approximation to the posterior • 1) Use IRLS to compute the mode • 2) Apply the Gaussian approximation at the mode

Gaussian Process Classification: Multi-class classification – The unnormalized log posterior: – 𝒛 : a dummy encoding of 𝑧 𝑗 ’s with the same layout as 𝒈 – 𝑳 : a block diagonal matrix containing 𝑳 𝑑

Gaussian Process Classification: Multi-class classification – Use IRLS to compute the mode

Gaussian Process Classification: Multi-class classification • The posterior predictive:

Gaussian Process Classification: Multi-class classification – The covariance of the latent response • Computer the posterior predictive for the visible response:

Gaussian Process Classification: Multi-class classification • Computing the marginal likelihood – similar to the binary case

Gaussian Processes Seung-Hoon Na Chonbuk National University - PowerPoint PPT Presentation

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression Predictions using noisy observations The case of a single test input: Gaussian Process Regression Computational and numerical issues It is

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Another introduction to Gaussian Processes Richard Wilkinson School of Maths and Statistics

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Determining the PSF over the Full FoV of LSST using Anisotropic Gaussian Processes

Scalable Gaussian Processes Zhenwen Dai Amazon September 4, 2018 @GPSS2018 Zhenwen Dai (Amazon)

Scalable Gaussian Processes Zhenwen Dai Amazon 9 September 2019 @GPSS 2019 Zhenwen Dai (Amazon)

Birth and Death Processes Today: Birth processes Birth and Death Processes Death

Programs, Processes, and Threads Programs, Processes, and Threads (Chapter 2) Processes

Enzyme Inhibitors Petr Kuzmi , Ph.D. BioKin, Ltd. TOPICS: 1. Fitting model : Four-parameter

Certificate in Digital Information Management: A Cross-Disciplinary Functional Approach Bruce

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture17: Logistic regression

Section 2.3 Section Summary ! Definition of a Function. ! Domain, Cdomain ! Image, Preimage !

Motion Estimation (I) Ce Liu celiu@microsoft.com Microsoft Research New England We live in a

Point sets, Maps and Navigation - II D.A. Forsyth Robustness is a serious problem Robustness is

Point sets, Maps and Navigation - III D.A. Forsyth Localization We can now robustly register

Generalized Linear Models (GLIMs) Probabilistic Graphical Models Sharif University of Technology