cs480 680 lecture 12 june 17 2019
play

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section - PowerPoint PPT Presentation

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec. 8.3 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Gaussian Process Regression Idea: distribution over functions University of


  1. CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec. 8.3 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1

  2. Gaussian Process Regression β€’ Idea: distribution over functions University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2

  3. Bayesian Linear Regression β€’ Setting: 𝑔 π’š = 𝒙 ! 𝜚(π’š) and 𝑧 = 𝑔 π’š + πœ— 𝑂(0, 𝜏 ! ) unknown β€’ Weight space view: – Prior: Pr 𝒙 – Posterior: Pr 𝒙 𝒀, 𝒛 = 𝑙 Pr 𝒙 Pr(𝒛|𝒙, 𝒀) Gaussian Gaussian Gaussian University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3

  4. Bayesian Linear Regression β€’ Setting: 𝑔 π’š = 𝒙 ! 𝜚(π’š) and 𝑧 = 𝑔 π’š + πœ— 𝑂(0, 𝜏 ! ) unknown β€’ Function space view: – Prior: Pr 𝑔 π’š βˆ— = ∫ 𝒙 Pr 𝑔 𝒙, π’š βˆ— Pr(𝒙) 𝑒𝒙 Gaussian Gaussian Deterministic – Posterior: Pr 𝑔 π’š βˆ— 𝒀, 𝒛 = ∫ 𝒙 Pr 𝑔 𝒙, π’š βˆ— Pr(𝒙|𝒀, 𝒛) 𝑒𝒙 Deterministic Gaussian Gaussian University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4

  5. Gaussian Process β€’ According to the function view, there is a Gaussian at 𝑔(π’š βˆ— ) for every π’š βˆ— . Those Gaussians are correlated through π‘₯ . β€’ What is the general form of Pr(𝑔) (i.e., distribution over functions)? β€’ Answer: Gaussian Process (infinite dimensional Gaussian distribution) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5

  6. Gaussian Process β€’ Distribution over functions: 𝑔 π’š ~ 𝐻𝑄 𝑛 π’š , 𝑙 π’š, π’š # βˆ€π’š, π’šβ€² β€’ Where 𝑛 π’š = 𝐹(𝑔 π’š ) is the mean and 𝑙 π’š, π’š # = 𝐹((𝑔 π’š βˆ’ 𝑛 π’š )(𝑔 π’š # βˆ’ 𝑛 π’š # ) is the kernel covariance function University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6

  7. Mean function 𝑛(π’š) β€’ Compute the mean function 𝑛(π’š) as follows: β€’ Let 𝑔 π’š = 𝜚 π’š ! 𝒙 with 𝒙 ~ 𝑂(𝟏, 𝛽 $% 𝑱) β€’ Then 𝑛 π’š = 𝐹(𝑔 π’š ) = 𝐹 𝒙 ! 𝜚 π’š = 𝟏 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7

  8. Kernel covariance function 𝑙(π’š, π’š ! ) β€’ Compute kernel covariance 𝑙(π’š, π’š # ) as follows: β€’ 𝑙 π’š, π’š # = 𝐹(𝑔 π’š 𝑔 π’š # ) = 𝜚 π’š ! 𝐹 𝒙𝒙 𝑼 𝜚(π’š # ) = 𝜚 π’š ! ' ( 𝜚 π’š # ) π’š ! ) π’š " = ( β€’ In some cases we can use domain knowledge to specify 𝑙 directly. University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8

  9. Examples β€’ Sampled functions from a Gaussian Process Gaussian kernel Exponential kernel " (Brownian motion) π’š%π’š ! 𝑙 π’š, π’š $ = 𝑓 %(|π’š%π’š ! | 𝑙 π’š, π’š $ = 𝑓 % !' " University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9

  10. Gaussian Process Regression β€’ Gaussian Process Regression corresponds to kernelized Bayesian Linear Regression β€’ Bayesian Linear Regression: – Weight space view – Goal: Pr(𝒙|𝒀, 𝒛) (posterior over 𝒙 ) – Complexity: cubic in # of basis functions β€’ Gaussian Process Regression: – Function space view – Goal: Pr(𝑔|𝒀, 𝒛) (posterior over 𝑔 ) – Complexity: cubic in # of training points University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10

  11. Recap: Bayesian Linear Regression β€’ Prior: Pr 𝒙 = 𝑂(𝟏, 𝚻) β€’ Likelihood: Pr 𝒛 𝒀, 𝒙 = 𝑂 𝒙 𝑼 𝚾, 𝜏 + 𝑱 𝒙, 𝑩 $𝟐 β€’ Posterior: Pr 𝒙 𝒀, 𝒛 = 𝑂 A 𝒙 = 𝜏 %! 𝑩 %𝟐 πšΎπ’› and 𝑩 = 𝜏 %! 𝚾𝚾 𝑼 + 𝚻 %, where 4 β€’ Prediction: $ 𝑩 "% πšΎπ’›, 𝜏 # + 𝜚 π’š βˆ— $ 𝑩 "% 𝜚(π’š βˆ— )) Pr 𝑧 βˆ— π’š βˆ— , 𝒀, 𝒛 = 𝑂(𝜏 "# 𝜚 π’š βˆ— β€’ Complexity: inversion of 𝑩 is cubic in # of basis functions University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11

  12. Μ… Μ… Gaussian Process Regression β€’ Prior: Pr 𝑔(β‹…) = 𝑂(𝑛(β‹…), 𝑙(β‹…,β‹…)) β€’ Likelihood: Pr 𝒛 𝒀, 𝑔 = 𝑂 𝑔(𝒀), 𝜏 + 𝑱 β€’ Posterior: Pr 𝑔(β‹…) 𝒀, 𝒛 = 𝑂 𝑔(β‹…), 𝑙′(β‹…,β‹…) 𝑳 + 𝜏 ! 𝑱 %, 𝒛 and where Μ… 𝑔(β‹…) = 𝑙 β‹…, 𝒀 𝑙 $ β‹…,β‹… = 𝑙 β‹…,β‹… + 𝜏 ! 𝑱 βˆ’ 𝑙 β‹…, 𝒀 𝑳 + 𝜏 ! 𝑱 %, 𝑙(𝒀,β‹…) 𝑔 π’š βˆ— , 𝑙 $ π’š βˆ— , π’š βˆ— β€’ Prediction: Pr 𝑧 βˆ— π’š βˆ— , 𝒀, 𝒛 = 𝑂 β€’ Complexity: inversion of 𝑳 + 𝜏 + 𝑱 is cubic in # of training points University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12

  13. Infinite Neural Networks β€’ Recall: neural networks with a single hidden layer (that contains sufficiently many hidden units) can approximate any function arbitrarily closely β€’ Neal 94: The limit of an infinite single hidden layer neural network is a Gaussian Process University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13

  14. Bayesian Neural Networks β€’ Consider a neural network with 𝐾 hidden units and a single identity output unit 𝑧 - : ' 𝑧 # = 𝑔 π’š; 𝒙 = βˆ‘ $%& π‘₯ #$ β„Ž βˆ‘ ( π‘₯ $( 𝑦 ( + π‘₯ $) + π‘₯ #) β€’ Bayesian learning: express prior over the weights – Weight space view: Pr π‘₯ !" where 𝐹 π‘₯ !" = 0, π‘Šπ‘π‘  π‘₯ !" = # βˆ€π‘˜ , $ Pr π‘₯ !% where 𝐹 π‘₯ !% = 0, π‘Šπ‘π‘  π‘₯ !% = 𝜏 & βˆ€π‘˜π‘— Type equation here. – Function space view: when 𝐾 β†’ ∞ , by the central limit theorem, an infinite sum of i.i.d. (identically and independently distributed) variables yields a Gaussian = 𝑂(𝑔(π’š)|0, 𝛽𝐹 β„Ž π’š β„Ž(π’šβ€²) + 𝜏 & ) Pr 𝑔 π’š University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

  15. Mean Derivation β€’ Calculation of the mean function: / β€’ 𝐹 𝑔(π’š) = βˆ‘ ,-. 𝐹[π‘₯ 0, β„Ž(π’š)] + 𝐹 π‘₯ 01 / = βˆ‘ ,-. 𝐹 π‘₯ 0, 𝐹 β„Ž π’š + 𝐹 π‘₯ 01 / = βˆ‘ ,-. 0 𝐹[β„Ž π’š ] + 0 = 0 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15

  16. Covariance Derivation β€’ 𝐷𝑝𝑀 𝑔 π’š , 𝑔(π’š ! ) = 𝐹[𝑔 π’š 𝑔 π’š ! ] βˆ’ 𝐹 𝑔 π’š 𝐹[𝑔 π’š ! ] = 𝐹[𝑔 π’š 𝑔 π’š ! ] βˆ‘ " π‘₯ #" β„Ž " π’š ! + π‘₯ #$ βˆ‘ " π‘₯ #" β„Ž " π’š + π‘₯ #$ = 𝐹 ' 𝐹 π‘₯ #" β„Ž " π’š π‘₯ #" β„Ž " π’š ! = βˆ‘ "%& + 𝐹[π‘₯ #$ π‘₯ #$ ] ( 𝐹 β„Ž " π’š β„Ž " π’š ! ( ] ' = βˆ‘ "%& 𝐹 π‘₯ #" + 𝐹[π‘₯ #$ ' π‘Šπ‘π‘  π‘₯ #" 𝐹 β„Ž π’š β„Ž π’š ! = βˆ‘ "%& + π‘Šπ‘π‘ (π‘₯ #$ ) ) ' ' 𝐹 β„Ž π’š β„Ž π’š ! + 𝜏 ( = βˆ‘ "%& = 𝛽𝐹 β„Ž π’š β„Ž π’š ! + 𝜏 ( University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16

  17. Bayesian Neural Networks β€’ When # of hidden units 𝐾 β†’ ∞ , then Bayesian neural net is equivalent to a Gaussian Process = 𝐻𝑄(𝑔(β‹…)|0, 𝛽𝐹 β„Ž β‹… β„Ž(β‹…) + 𝜏 2 ) Pr 𝑔 β‹… β€’ Note: this works for – Any activation function β„Ž – Any i.i.d. prior over the weights with mean 0 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17

  18. Case Study: AIBO Gait Optimization University of Waterloo CS480/680 Spring 2019 Pascal Poupart 18

  19. Gait Optimization β€’ Problem: find best parameter setting of the gait controller to maximize walking speed – Why?: Fast robots have a better chance of winning in robotic soccer β€’ Solutions: – Stochastic hill climbing – Gaussian Processes β€’ Lizotte, Wang, Bowling, Schuurmans (2007) Automatic Gait Optimization with Gaussian Processes, International Joint Conferences on Artificial Intelligence (IJCAI) . University of Waterloo CS480/680 Spring 2019 Pascal Poupart 19

  20. Search Problem β€’ Let π’š ∈ β„œ %< , be a vector of 15 parameters that defines a controller for gait β€’ Let 𝑔: π’š β†’ β„œ be a mapping from controller parameters to gait speed β€’ Problem: find parameters π’š βˆ— that yield highest speed. π’š βˆ— ← 𝑏𝑠𝑕𝑛𝑏𝑦 π’š 𝑔(π’š) But 𝑔 is unknown… University of Waterloo CS480/680 Spring 2019 Pascal Poupart 20

  21. Approach β€’ Picture University of Waterloo CS480/680 Spring 2019 Pascal Poupart 21

  22. Approach β€’ Initialize 𝑔 β‹… ~ 𝐻𝑄(𝑛 β‹… , 𝑙 β‹…,β‹… ) β€’ Repeat: – Select new π’š: # π’š,π’š π’š *+, ← 𝑏𝑠𝑕𝑛𝑏𝑦 π’š 2 π’š ( 34 π’š /01 π’š(∈* – Evaluate 𝑔(π’š 𝒐𝒇𝒙 ) by observing speed of robot with parameters set to π’š *+, – Update Gaussian process: β€’ 𝒀 ← 𝒀 βˆͺ {π’š 𝒐𝒇𝒙 } and 𝒛 ← 𝒛 βˆͺ 𝑔(π’š 𝒐𝒇𝒙 ) 𝑳 + 𝜏 & 𝑱 ./ 𝒛 β€’ 𝑛 β‹… ← 𝑙 β‹…, 𝒀 β€’ 𝑙 β‹…,β‹… ← 𝑙 β‹…,β‹… + 𝜏 & 𝑱 βˆ’ 𝑙 β‹…, 𝒀 𝑳 + 𝜏 & 𝑱 ./ 𝑙(𝒀,β‹…) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 22

  23. Results Gaussian kernel: # 𝑓 "% # π’š"π’š ! " ) π’š"π’š ! 𝑙 π’š, π’š & = 𝜏 ' University of Waterloo CS480/680 Spring 2019 Pascal Poupart 23

Recommend


More recommend