gaussian processes for big data
play

Gaussian Processes for Big Data James Hensman joint work with - PowerPoint PPT Presentation

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples Overview Motivation Sparse Gaussian Processes


  1. Gaussian Processes for Big Data James Hensman joint work with Nicol´ o Fusi, Neil D. Lawrence

  2. Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

  3. Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

  4. Motivation Inference in a GP has the following demands: O ( n 3 ) Complexity: O ( n 2 ) Storage: Inference in a sparse GP has the following demands: O ( nm 2 ) Complexity: O ( nm ) Storage: where we get to pick m !

  5. Still not good enough! Big Data ◮ In parametric models, stochastic optimisation is used. ◮ This allows for application to Big Data. This work ◮ Show how to use Stochastic Variational Inference in GPs ◮ Stochastic optimisation scheme: each step requires O ( m 3 )

  6. Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

  7. Computational savings K nn ≈ Q nn = K nm K − 1 mm K mn Instead of inverting K nn , we make a low rank (or Nystr¨ om) approximation, and invert K mm instead.

  8. Information capture Everything we want to do with a GP involves marginalising f ◮ Predictions ◮ Marginal likelihood ◮ Estimating covariance parameters The posterior of f is the central object. This means inverting K nn .

  9. s e u l a v n X , y o i t c n u f input space (X)

  10. s e u l a v n X , y o i t c n u f f ( x ) ∼ G P input space (X)

  11. s e u l a v n X , y o i t c n u f ( x ) ∼ G P f p ( f ) = N ( 0 , K nn ) input space (X)

  12. s e u l a v n X , y o i t c n u f ( x ) ∼ G P f p ( f ) = N ( 0 , K nn ) p ( f | y , X ) input space (X)

  13. Introducing u Take and extra M points on the function, u = f ( Z ). p ( y , f , u ) = p ( y | f ) p ( f | u ) p ( u )

  14. Introducing u

  15. Introducing u Take and extra M points on the function, u = f ( Z ). p ( y , f , u ) = p ( y | f ) p ( f | u ) p ( u ) � � y | f , σ 2 I p ( y | f ) = N � � f | K nm K mm ı u , � p ( f | u ) = N K p ( u ) = N ( u | 0 , K mm )

  16. s e u l a v n X , y o i t c n u f ( x ) ∼ G P f p ( f ) = N ( 0 , K nn ) p ( f | y , X ) Z , u p ( u ) = N ( 0 , K mm ) input space (X)

  17. s e u l a v n X , y o i t c n u f ( x ) ∼ G P f p ( f ) = N ( 0 , K nn ) p ( f | y , X ) p ( u ) = N ( 0 , K mm ) � p ( u | y , X ) input space (X)

  18. The alternative posterior Instead of doing p ( y | f ) p ( f | X ) � p ( f | y , X ) = p ( y | f ) p ( f | X )d f We’ll do p ( y | u ) p ( u | Z ) � p ( u | y , Z ) = p ( y | u ) p ( u | Z )d u

  19. The alternative posterior Instead of doing p ( y | f ) p ( f | X ) � p ( f | y , X ) = p ( y | f ) p ( f | X )d f We’ll do p ( y | u ) p ( u | Z ) � p ( u | y , Z ) = p ( y | u ) p ( u | Z )d u but p ( y | u ) involves inverting K nn

  20. Variational marginalisation of f � ln p ( y | u ) = ln p ( y | f ) p ( f | u , X )d f

  21. Variational marginalisation of f � ln p ( y | u ) = ln p ( y | f ) p ( f | u , X )d f � p ( y | f ) � ln p ( y | u ) = ln E p ( f | u , X )

  22. Variational marginalisation of f � ln p ( y | u ) = ln p ( y | f ) p ( f | u , X )d f � p ( y | f ) � ln p ( y | u ) = ln E p ( f | u , X ) � ln p ( y | f ) � � ln � ln p ( y | u ) ≥ E p ( f | u , X ) p ( y | u )

  23. Variational marginalisation of f � ln p ( y | u ) = ln p ( y | f ) p ( f | u , X )d f � p ( y | f ) � ln p ( y | u ) = ln E p ( f | u , X ) � ln p ( y | f ) � � ln � ln p ( y | u ) ≥ E p ( f | u , X ) p ( y | u ) No inversion of K nn required

  24. An approximate likelihood � n � mm u , σ 2 � � � �� y i | k ⊤ mn K − 1 − 1 k nn − k ⊤ mn K − 1 � p ( y | u ) = N exp mm k mn 2 σ 2 i = 1 A straightforward likelihood approximation, and a penalty term

  25. Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

  26. log p ( y | X ) ≥ � L 1 + log p ( u ) − log q ( u ) � q ( u ) � L 3 . (1) � � n � mm m , β − 1 � y i | k ⊤ mn K − 1 L 3 = log N i = 1 � − 1 k i , i − 1 2 β � 2tr ( S Λ i ) − KL � q ( u ) � p ( u ) � (2)

  27. Optimisation The variational objective L 3 is a function of ◮ the parameters of the covariance function ◮ the parameters of q ( u ) ◮ the inducing inputs, Z Strategy: set Z . Take the data in small minibatches, take stochastic gradient steps in the covariance function parameters, stochastic natural gradient steps in the parameters of q ( u ).

  28. Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples

  29. UK apartment prices ◮ Monthly price paid data for February to October 2012 (England and Wales) ◮ from http://data.gov.uk/dataset/ land-registry-monthly-price-paid-data/ ◮ 75,000 entries ◮ Cross referenced against a postcode database to get lattitude and longitude ◮ Regressed the normalised logarithm of the apartment prices

  30. Airline data ◮ Flight delays for every 0.9 0.8 commercial flight in the 0.7 USA from January to April Inverse lengthscale 0.6 0.5 2008. 0.4 ◮ Average delay was 30 0.3 0.2 minutes. 0.1 ◮ We randomly selected 0.0 Month DayOfMonth DayOfWeek DepTime ArrTime AirTime Distance PlaneAge 800,000 datapoints (we have limited memory!) ◮ 700,000 train, 100,000 test

  31. GPs on subsets SVI GP 37 37 36 36 35 35 RMSE 34 34 33 33 32 32 N=800 N=1000 N=1200 0 200 400 600 800 1000 1200 iteration

  32. Download the code! github.com/SheffieldML/GPy Cite our paper! Hensman, Fusi and Lawrence, Gaussian Processes for Big Data Proceedings of UAI 2013

Recommend


More recommend