CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec. 8.3 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1
Gaussian Process Regression β’ Idea: distribution over functions University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2
Bayesian Linear Regression β’ Setting: π π = π ! π(π) and π§ = π π + π π(0, π ! ) unknown β’ Weight space view: β Prior: Pr π β Posterior: Pr π π, π = π Pr π Pr(π|π, π) Gaussian Gaussian Gaussian University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3
Bayesian Linear Regression β’ Setting: π π = π ! π(π) and π§ = π π + π π(0, π ! ) unknown β’ Function space view: β Prior: Pr π π β = β« π Pr π π, π β Pr(π) ππ Gaussian Gaussian Deterministic β Posterior: Pr π π β π, π = β« π Pr π π, π β Pr(π|π, π) ππ Deterministic Gaussian Gaussian University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4
Gaussian Process β’ According to the function view, there is a Gaussian at π(π β ) for every π β . Those Gaussians are correlated through π₯ . β’ What is the general form of Pr(π) (i.e., distribution over functions)? β’ Answer: Gaussian Process (infinite dimensional Gaussian distribution) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5
Gaussian Process β’ Distribution over functions: π π ~ π»π π π , π π, π # βπ, πβ² β’ Where π π = πΉ(π π ) is the mean and π π, π # = πΉ((π π β π π )(π π # β π π # ) is the kernel covariance function University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6
Mean function π(π) β’ Compute the mean function π(π) as follows: β’ Let π π = π π ! π with π ~ π(π, π½ $% π±) β’ Then π π = πΉ(π π ) = πΉ π ! π π = π University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7
Kernel covariance function π(π, π ! ) β’ Compute kernel covariance π(π, π # ) as follows: β’ π π, π # = πΉ(π π π π # ) = π π ! πΉ ππ πΌ π(π # ) = π π ! ' ( π π # ) π ! ) π " = ( β’ In some cases we can use domain knowledge to specify π directly. University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8
Examples β’ Sampled functions from a Gaussian Process Gaussian kernel Exponential kernel " (Brownian motion) π%π ! π π, π $ = π %(|π%π ! | π π, π $ = π % !' " University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9
Gaussian Process Regression β’ Gaussian Process Regression corresponds to kernelized Bayesian Linear Regression β’ Bayesian Linear Regression: β Weight space view β Goal: Pr(π|π, π) (posterior over π ) β Complexity: cubic in # of basis functions β’ Gaussian Process Regression: β Function space view β Goal: Pr(π|π, π) (posterior over π ) β Complexity: cubic in # of training points University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10
Recap: Bayesian Linear Regression β’ Prior: Pr π = π(π, π») β’ Likelihood: Pr π π, π = π π πΌ πΎ, π + π± π, π© $π β’ Posterior: Pr π π, π = π A π = π %! π© %π πΎπ and π© = π %! πΎπΎ πΌ + π» %, where 4 β’ Prediction: $ π© "% πΎπ, π # + π π β $ π© "% π(π β )) Pr π§ β π β , π, π = π(π "# π π β β’ Complexity: inversion of π© is cubic in # of basis functions University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11
Μ Μ Gaussian Process Regression β’ Prior: Pr π(β ) = π(π(β ), π(β ,β )) β’ Likelihood: Pr π π, π = π π(π), π + π± β’ Posterior: Pr π(β ) π, π = π π(β ), πβ²(β ,β ) π³ + π ! π± %, π and where Μ π(β ) = π β , π π $ β ,β = π β ,β + π ! π± β π β , π π³ + π ! π± %, π(π,β ) π π β , π $ π β , π β β’ Prediction: Pr π§ β π β , π, π = π β’ Complexity: inversion of π³ + π + π± is cubic in # of training points University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12
Infinite Neural Networks β’ Recall: neural networks with a single hidden layer (that contains sufficiently many hidden units) can approximate any function arbitrarily closely β’ Neal 94: The limit of an infinite single hidden layer neural network is a Gaussian Process University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13
Bayesian Neural Networks β’ Consider a neural network with πΎ hidden units and a single identity output unit π§ - : ' π§ # = π π; π = β $%& π₯ #$ β β ( π₯ $( π¦ ( + π₯ $) + π₯ #) β’ Bayesian learning: express prior over the weights β Weight space view: Pr π₯ !" where πΉ π₯ !" = 0, πππ π₯ !" = # βπ , $ Pr π₯ !% where πΉ π₯ !% = 0, πππ π₯ !% = π & βππ Type equation here. β Function space view: when πΎ β β , by the central limit theorem, an infinite sum of i.i.d. (identically and independently distributed) variables yields a Gaussian = π(π(π)|0, π½πΉ β π β(πβ²) + π & ) Pr π π University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14
Mean Derivation β’ Calculation of the mean function: / β’ πΉ π(π) = β ,-. πΉ[π₯ 0, β(π)] + πΉ π₯ 01 / = β ,-. πΉ π₯ 0, πΉ β π + πΉ π₯ 01 / = β ,-. 0 πΉ[β π ] + 0 = 0 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15
Covariance Derivation β’ π·ππ€ π π , π(π ! ) = πΉ[π π π π ! ] β πΉ π π πΉ[π π ! ] = πΉ[π π π π ! ] β " π₯ #" β " π ! + π₯ #$ β " π₯ #" β " π + π₯ #$ = πΉ ' πΉ π₯ #" β " π π₯ #" β " π ! = β "%& + πΉ[π₯ #$ π₯ #$ ] ( πΉ β " π β " π ! ( ] ' = β "%& πΉ π₯ #" + πΉ[π₯ #$ ' πππ π₯ #" πΉ β π β π ! = β "%& + πππ (π₯ #$ ) ) ' ' πΉ β π β π ! + π ( = β "%& = π½πΉ β π β π ! + π ( University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16
Bayesian Neural Networks β’ When # of hidden units πΎ β β , then Bayesian neural net is equivalent to a Gaussian Process = π»π(π(β )|0, π½πΉ β β β(β ) + π 2 ) Pr π β β’ Note: this works for β Any activation function β β Any i.i.d. prior over the weights with mean 0 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17
Case Study: AIBO Gait Optimization University of Waterloo CS480/680 Spring 2019 Pascal Poupart 18
Gait Optimization β’ Problem: find best parameter setting of the gait controller to maximize walking speed β Why?: Fast robots have a better chance of winning in robotic soccer β’ Solutions: β Stochastic hill climbing β Gaussian Processes β’ Lizotte, Wang, Bowling, Schuurmans (2007) Automatic Gait Optimization with Gaussian Processes, International Joint Conferences on Artificial Intelligence (IJCAI) . University of Waterloo CS480/680 Spring 2019 Pascal Poupart 19
Search Problem β’ Let π β β %< , be a vector of 15 parameters that defines a controller for gait β’ Let π: π β β be a mapping from controller parameters to gait speed β’ Problem: find parameters π β that yield highest speed. π β β ππ ππππ¦ π π(π) But π is unknownβ¦ University of Waterloo CS480/680 Spring 2019 Pascal Poupart 20
Approach β’ Picture University of Waterloo CS480/680 Spring 2019 Pascal Poupart 21
Approach β’ Initialize π β ~ π»π(π β , π β ,β ) β’ Repeat: β Select new π: # π,π π *+, β ππ ππππ¦ π 2 π ( 34 π /01 π(β* β Evaluate π(π πππ ) by observing speed of robot with parameters set to π *+, β Update Gaussian process: β’ π β π βͺ {π πππ } and π β π βͺ π(π πππ ) π³ + π & π± ./ π β’ π β β π β , π β’ π β ,β β π β ,β + π & π± β π β , π π³ + π & π± ./ π(π,β ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 22
Results Gaussian kernel: # π "% # π"π ! " ) π"π ! π π, π & = π ' University of Waterloo CS480/680 Spring 2019 Pascal Poupart 23
Recommend
More recommend