15.1 Last Lecture Want to solve a regression problem. confidence - PDF document

CS/CNS/EE 253: Advanced Topics in Machine Learning Topic: Nonparametric learning #4: Gaussian Process Regression Lecturer: Andreas Krause Scribe: Tim Black Date: March 1, 2010 15.1 Last Lecture Want to solve a regression problem. confidence band f ∗ = argmin � f � 2 + � ( y i − f ( x i )) 2 (15.1.1) f ∈ H k i This trades off complexity and goodness of fit. It does not encapsulate the probability of being right. Idea: Prior P ( f ). Get data ( x i , y i ), where y i = f ( x i ) + ǫ i is function plus some error. Posterior P ( f | D ). 15.2 Gaussian Processes (GPs) ” ∞ -dimensional Gaussians” Definition 15.2.1 Let X be an input set. Let k : X × X → R be a positive definite kernel function. Let µ : X → R be the mean function (with no restrictions; later we will take it to be the 0 function). A random function f : X → R is a GP , f ∼ GP( µ, k ) if for any A ⊂ X , A = { x 1 , x 2 , . . . , x n } , f A = ( f ( x 1 ) , . . . , f ( x n )) ∼ N ( µ A , Σ AA ) , where µ A = ( µ ( x 1 ) , . . . , µ ( x n )) and the covariance matrix Σ AA is the n × n matrix whose ( i, j ) entry is k ( x i , x j ) . 1

f(x) f(x) (for set x) covariance mean function x x probability of f(x) 15.2.1 Prediction in GPs Suppose we get to see f A = f ′ (that is, we get to see the value of f at a set of points without noise). What is P ( f ( x ) | f A = f ′ )? Conditionals on a Gaussian give a Gaussian: P ( f ( x ) | f A = AA ( f ′ − µ A ), σ 2 x | A ), where µ x | A = µ x + Σ xA Σ − 1 x − Σ xA Σ − 1 f ′ ) = N ( f ( x ); µ x | A , σ 2 x | A = σ 2 AA Σ Ax , and Σ x A = ( k ( x, x 1 ) , k ( x, x 2 ) , . . . , k ( x, x n )). What if we have noise? y i = f ( x i ) + ǫ i , ǫ i ∼ N (0 , Θ 2 ) y A = f A + ǫ A , ǫ A ∼ N (0 , σ 2 I ) P ( y A ) = N ( y A ; µ A , Σ AA + σ 2 x | A ) P ( f ( x ) | y A ) = N ( f ( x ); µ x | A , σ 2 x | A ) µ x | A = µ x + Σ xA (Σ AA + σ 2 I ) − 1 ( y A − µ A ) σ 2 x | A = σ 2 x + Σ xA (Σ AA + σ 2 I ) − 1 Σ Ax NEVER EVER CALCULATE (Σ AA + σ 2 I ) − 1 ) AS inv( Σ AA + σ 2 I ) . Instead, solve the linear 2

system α = ˜ Σ AA � ( y A − µ A ) (Matlab notation), where ˜ Σ AA = Σ AA + σ 2 I , and α = ˜ Σ − 1 AA ( y A − µ A ). 15.2.2 Connection to RKHS Assume µ = 0. µ x | A = µ x + Σ xA (Σ AA + σ 2 I ) − 1 y A = � n i =1 α i k ( x i , x ). � f � 2 + 1 i ( y i − f ( x i )) 2 . argmax P ( f | y A ) = argmin � σ 2 f ∈ H k f ∈ H k 15.2.3 Parameter Estimation Most kernel functions have parameters. k SE ( x, x ′ ) = c 2 exp ( − ( x − x ′ ) 2 ). Parameters are Θ = ( c, h ). c is the “magnitude” of the functions, h and h is the “length scale” of f . small h k(x, x') x' x Let N u = Number of upcrossings at level in [0 , 1] (“How wiggly is this function”). Assume k is isotropic , k ( x, x ′ ) = k ( � x − x ′ � ), and assume µ = 0. 3

Theorem 15.2.2 (Adler). � − u 2 k ′′ (0) E [ N u ] = 1 � � (15.2.2) k (0) exp 2 π 2 k (0) For SE kernel: k (0) = c 2 , k ′′ (0) = − 2 c 2 h 2 � � � � � 1 2 − u 2 1 1 − u 2 E [ N u ] = = . h 2 exp √ h exp 2 π 2 c 2 2 c 2 π 2 How do we choose the parameters? Learn from data! Pick parameters to maximize likelihood. Θ ∗ = argmax P ( y A | Θ) = argmax � P ( y A , f | Θ) d � P ( y A | f, Θ) P ( f | Θ) d Θ = argmax � P ( y A | f ) P ( f | Θ) d Θ f = argmax Θ Θ Θ Θ P(y |f) low high high A low high high P(f|O) P ( y A | Θ) = N (0 , Σ AA (Θ) + σ 2 I ) = (2 π (Σ AA (Θ))) − n/ 2 exp ( − 1 2 y T A (Σ AA (Θ) + σ 2 I ) − 1 y A ). Solve the optimization problem by gradient descent. Θ ∗ = argmax P ( y A | Θ) Θ How do we solve it? Calculate gradient of log ( P ( y A | Θ)). Run conjugate gradient descent . 15.2.4 Incorporating prior knowlege f = g + h ∼ GP(0 , k lin + k SE ), where g is parametric and h is nonparametric. i w i φ i ( x ). h ∼ GP(0 , k SE ). w ∼ N (0 , I ). g ∼ GP(0 , k lin ). k lin ( x, x ′ ) = � i φ i ( x ) φ i ( x ′ ) = g ( x ) = � φ ( x ) T φ ( x ′ ). 4

15.1 Last Lecture Want to solve a regression problem. confidence - PDF document

CS/CNS/EE 253: Advanced Topics in Machine Learning Topic: Nonparametric learning #4: Gaussian Process Regression Lecturer: Andreas Krause Scribe: Tim Black Date: March 1, 2010 15.1 Last Lecture Want to solve a regression problem. confidence band

Math 3B: Lecture 2 Noah White September 26, 2016 Last time Last time, we spoke about The

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Who Is My Counselor? Last Name A-Co: Mrs. Ary Last Name Cr-He: Mr. Peslak Last Name Hi-Ma:

Recall last lecture ... Lecture 8 Also last lecture: Painter's Algorithm More Hidden Surface

Lecture 28: Last Lecture! COMPSCI/MATH 290-04 Chris Tralie, Duke University 4/26/2016

Presentation Last Names A-E Ms. Kennair Last Names F-L Ms. Fornera Last Names M-R Ms. Tippins

Complexity of Counting Lecture 21 #P: Toda s Theorem 1 Last Time 2 Last Time #P:

AAA Showcase! Who is my counselor? Last Names A-EL: Mr. Melvin Last Names EM-LEE: Ms. Tauer

Last Monday: Department Meetings Last Tuesday/Wednesday: Tours and Information Gathering Last

Order Statistics Algorithm Quicksort(A, first, last) if first < last then // // Partition

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CS 241: Systems Programming Lecture 15. Strings Fall 2019 Prof. Stephen Checkoway 1 Review of

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

Lecture 13 Logistics HW5 due Wednesday Last lecture Finished combinational logic

CSE 258 Lecture 4 Web Mining and Recommender Systems Evaluating Classifiers Last lecture

Reconst nstruct ruct Radio o Map with Automatic atically ally Constru tructed cted Gaussia

I ntroduction to Mobile Robotics Gaussian Processes Wolfram Burgard Cyrill Stachniss Giorgio

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Kriging a.k.a. Gaussian Process Regression(GPR) Yubo Paul Yang, Algorithm Interest Group,

Introduction to Nonparametric Bayesian Modeling and Gaussian Process Regression Piyush Rai Dept.

GP-BayesFilters Bayes Filters CSE-571 u(k-1) u(k) u(k+1)

Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu L Murray, Iain

15.1 Last Lecture Want to solve a regression problem. confidence - PDF document

CS/CNS/EE 253: Advanced Topics in Machine Learning Topic: Nonparametric learning #4: Gaussian Process Regression Lecturer: Andreas Krause Scribe: Tim Black Date: March 1, 2010 15.1 Last Lecture Want to solve a regression problem. confidence band

Math 3B: Lecture 2 Noah White September 26, 2016 Last time Last time, we spoke about The

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Who Is My Counselor? Last Name A-Co: Mrs. Ary Last Name Cr-He: Mr. Peslak Last Name Hi-Ma:

Recall last lecture ... Lecture 8 Also last lecture: Painter's Algorithm More Hidden Surface

Lecture 28: Last Lecture! COMPSCI/MATH 290-04 Chris Tralie, Duke University 4/26/2016

Presentation Last Names A-E Ms. Kennair Last Names F-L Ms. Fornera Last Names M-R Ms. Tippins

Complexity of Counting Lecture 21 #P: Toda s Theorem 1 Last Time 2 Last Time #P:

AAA Showcase! Who is my counselor? Last Names A-EL: Mr. Melvin Last Names EM-LEE: Ms. Tauer

Last Monday: Department Meetings Last Tuesday/Wednesday: Tours and Information Gathering Last

Order Statistics Algorithm Quicksort(A, first, last) if first &lt; last then // // Partition

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CS 241: Systems Programming Lecture 15. Strings Fall 2019 Prof. Stephen Checkoway 1 Review of

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

Lecture 13 Logistics HW5 due Wednesday Last lecture Finished combinational logic

CSE 258 Lecture 4 Web Mining and Recommender Systems Evaluating Classifiers Last lecture

Reconst nstruct ruct Radio o Map with Automatic atically ally Constru tructed cted Gaussia

I ntroduction to Mobile Robotics Gaussian Processes Wolfram Burgard Cyrill Stachniss Giorgio

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Kriging a.k.a. Gaussian Process Regression(GPR) Yubo Paul Yang, Algorithm Interest Group,

Introduction to Nonparametric Bayesian Modeling and Gaussian Process Regression Piyush Rai Dept.

GP-BayesFilters Bayes Filters CSE-571 u(k-1) u(k) u(k+1)

Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu L Murray, Iain

Order Statistics Algorithm Quicksort(A, first, last) if first < last then // // Partition