An introduction to Gaussian processes Oliver Stegle and Karsten Borgwardt Machine Learning and Computational Biology Research Group, Max Planck Institute for Biological Cybernetics and Max Planck Institute for Developmental Biology, Tübingen Oliver Stegle and Karsten Borgwardt: Computational Approaches for Analysing Complex Biological Systems, Page 1
Motivation Why Gaussian processes? ◮ So far: linear models with a finite number of basis functions, e.g. φ ( x ) = (1 , x, x 2 , . . . , x K ) ◮ Open questions: ◮ How to design a suitable basis? ◮ How many basis functions to pick? ◮ Gaussian processes: accurate and flexible regression method yielding predictions alongside with error bars. O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 1
Motivation Why Gaussian processes? ◮ So far: linear models with a finite number of basis functions, e.g. φ ( x ) = (1 , x, x 2 , . . . , x K ) 3 . 5 ◮ Open questions: 3 . 0 ◮ How to design a suitable 2 . 5 basis? Y 2 . 0 ◮ How many basis functions to 1 . 5 pick? 1 . 0 ◮ Gaussian processes: accurate 0 . 5 and flexible regression method 0 2 4 6 8 10 X yielding predictions alongside with error bars. O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 1
Motivation Why Gaussian processes? ◮ So far: linear models with a finite number of basis functions, e.g. φ ( x ) = (1 , x, x 2 , . . . , x K ) 3 . 5 ◮ Open questions: 3 . 0 ◮ How to design a suitable 2 . 5 basis? Y 2 . 0 ◮ How many basis functions to 1 . 5 pick? 1 . 0 ◮ Gaussian processes: accurate 0 . 5 and flexible regression method 0 2 4 6 8 10 X yielding predictions alongside with error bars. O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 1
Motivation Further reading ◮ A comprehensive and very good introduction to Gaussian processes C. E. Rasmussen, C. K. Williams Gaussian proceesses for machine learning ◮ Free download: http://www.gaussianprocess.org/gpml/ ◮ A really good introductory movie to watch http://videolectures.net/gpip06 mackay gpb/ ◮ Many ideas used in this course are borrowed from this lecture. O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 2
Outline Outline O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 3
Intuitive approach Outline Motivation Intuitive approach Function space view GP classification & other extensions Summary O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 4
Intuitive approach The Gaussian distribution ◮ Gaussian processes are merely based on the good old Gaussian � � � � � 1 − 1 − 1 ( x − µ ) 2( x − µ ) T K � N = exp x � µ , K � | 2 π K | ◮ Covariance matrix or kernel matrix O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 5
Intuitive approach A 2D Gaussian 3 2 1 y2 0 ◮ Probability contour − 1 ◮ Samples − 2 − 3 − 3 − 2 − 1 0 1 2 3 y1 � � 1 0.6 K = 0.6 1 O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 6
Intuitive approach A 2D Gaussian 3 2 1 y2 0 ◮ Probability contour − 1 ◮ Samples − 2 − 3 − 3 − 2 − 1 0 1 2 3 y1 � � 1 0.6 K = 0.6 1 O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 6
Intuitive approach A 2D Gaussian Varying the covariance matrix 3 3 3 2 2 2 1 1 1 y2 y2 y2 0 0 0 − 1 − 1 − 1 − 2 − 2 − 2 − 3 − 3 − 3 − 3 − 2 − 1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 y1 y1 y1 � � � � � � 1 0.14 1 0.6 1 -0.9 K = K = K = 0.14 1 0.6 1 -0.9 1 O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 7
Intuitive approach A 2D Gaussian Inference 3 2 1 0 − 1 − 2 − 3 − 3 − 2 − 1 0 1 2 3 O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 8
Intuitive approach A 2D Gaussian Inference 3 2 1 0 − 1 − 2 − 3 − 3 − 2 − 1 0 1 2 3 O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 8
Intuitive approach A 2D Gaussian Inference 3 2 1 0 − 1 − 2 − 3 − 3 − 2 − 1 0 1 2 3 O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 8
Intuitive approach Inference ◮ Joint probability p ( y 1 , y 2 | K ) = N ([ y 1 , y 2 ] | 0 , K ) ◮ Conditional probability p ( y 2 | y 1 , K ) = p ( y 1 , y 2 | K ) p ( y 1 | K ) � y 1 � �� − 1 2[ y 1 , y 2 ] K − 1 ∝ exp y 2 ◮ Completing the square yields a Gaussian with non-zero as posterior for y 2 . O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 9
Intuitive approach Extending the idea to higher dimensions ◮ Let us interpret y 1 and y 2 as outputs in a regression setting. ◮ We can introduce an additional 3rd point. 5 4 3 Y 2 1 0 1 2 X ◮ Now P ([ y 1 , y 2 , y 3 ] | K 3 ) = N ([ y 1 , y 2 , y 3 ] | 0 , K 3 ) , where K 3 is now a 3 x 3 covariance matrix! O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 10
Intuitive approach Extending the idea to higher dimensions ◮ Let us interpret y 1 and y 2 as outputs in a regression setting. ◮ We can introduce an additional 3rd point. 5 4 3 Y 2 1 0 1 2 X ◮ Now P ([ y 1 , y 2 , y 3 ] | K 3 ) = N ([ y 1 , y 2 , y 3 ] | 0 , K 3 ) , where K 3 is now a 3 x 3 covariance matrix! O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 10
Intuitive approach Extending the idea to higher dimensions ◮ Let us interpret y 1 and y 2 as outputs in a regression setting. ◮ We can introduce an additional 3rd point. 5 4 3 Y 2 1 0 0 1 2 X ◮ Now P ([ y 1 , y 2 , y 3 ] | K 3 ) = N ([ y 1 , y 2 , y 3 ] | 0 , K 3 ) , where K 3 is now a 3 x 3 covariance matrix! O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 10
Intuitive approach Extending the idea to higher dimensions ◮ Let us interpret y 1 and y 2 as outputs in a regression setting. ◮ We can introduce an additional 3rd point. 5 4 3 Y 2 1 0 0 1 2 X ◮ Now P ([ y 1 , y 2 , y 3 ] | K 3 ) = N ([ y 1 , y 2 , y 3 ] | 0 , K 3 ) , where K 3 is now a 3 x 3 covariance matrix! O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 10
Intuitive approach Constructing Covariance Matrices ◮ Analogously we can look at the joint probability for arbitrary many points and obtain predictions. ◮ Issue: how to construct a good covariance matrix? ◮ A simple heuristics � � 1 0.6 K 2 = 0.6 1 1 0.6 0 K 3 = 0.6 1 0.6 0 0.6 1 ◮ Note: ◮ The ordering of the points y 1 , y 2 , y 3 matters. ◮ Important to ensure that covariance matrices remain positive definite (matrix inversion). O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 11
Intuitive approach Constructing Covariance Matrices ◮ Analogously we can look at the joint probability for arbitrary many points and obtain predictions. ◮ Issue: how to construct a good covariance matrix? ◮ A simple heuristics � � 1 0.6 K 2 = 0.6 1 1 0.6 0 K 3 = 0.6 1 0.6 0 0.6 1 ◮ Note: ◮ The ordering of the points y 1 , y 2 , y 3 matters. ◮ Important to ensure that covariance matrices remain positive definite (matrix inversion). O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 11
Intuitive approach Constructing Covariance Matrices ◮ Analogously we can look at the joint probability for arbitrary many points and obtain predictions. ◮ Issue: how to construct a good covariance matrix? ◮ A simple heuristics � � 1 0.6 K 2 = 0.6 1 1 0.6 0 K 3 = 0.6 1 0.6 0 0.6 1 ◮ Note: ◮ The ordering of the points y 1 , y 2 , y 3 matters. ◮ Important to ensure that covariance matrices remain positive definite (matrix inversion). O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 11
Intuitive approach Constructing Covariance Matrices A general recipe ◮ Use a covariance function (kernel function) to construct K : K i,j = k ( x i , x j ; θ K ) ◮ For example: The squared exponential covariance function embodies the belief that points further apart are less correlated: � � − 0 . 5 · ( x i − x j ) 2 k SE ( x i , x j , ; A, L ) = A 2 exp L 2 Overall correlation, amplitude ◮ Scaling parameter, smoothness ◮ ◮ θ K = { A, L } are called hyperparameters. ◮ We denote the covariance matrix for a set of inputs X = { x 1 , . . . , x N } as: K X , X ( θ K ) O. Stegle & K. Borgwardt An introduction to Gaussian processes T¨ ubingen 12
Recommend
More recommend