Gaussian Processes for Big Data James Hensman joint work with Nicol´ o Fusi, Neil D. Lawrence
Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples
Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples
Motivation Inference in a GP has the following demands: O ( n 3 ) Complexity: O ( n 2 ) Storage: Inference in a sparse GP has the following demands: O ( nm 2 ) Complexity: O ( nm ) Storage: where we get to pick m !
Still not good enough! Big Data ◮ In parametric models, stochastic optimisation is used. ◮ This allows for application to Big Data. This work ◮ Show how to use Stochastic Variational Inference in GPs ◮ Stochastic optimisation scheme: each step requires O ( m 3 )
Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples
Computational savings K nn ≈ Q nn = K nm K − 1 mm K mn Instead of inverting K nn , we make a low rank (or Nystr¨ om) approximation, and invert K mm instead.
Information capture Everything we want to do with a GP involves marginalising f ◮ Predictions ◮ Marginal likelihood ◮ Estimating covariance parameters The posterior of f is the central object. This means inverting K nn .
s e u l a v n X , y o i t c n u f input space (X)
s e u l a v n X , y o i t c n u f f ( x ) ∼ G P input space (X)
s e u l a v n X , y o i t c n u f ( x ) ∼ G P f p ( f ) = N ( 0 , K nn ) input space (X)
s e u l a v n X , y o i t c n u f ( x ) ∼ G P f p ( f ) = N ( 0 , K nn ) p ( f | y , X ) input space (X)
Introducing u Take and extra M points on the function, u = f ( Z ). p ( y , f , u ) = p ( y | f ) p ( f | u ) p ( u )
Introducing u
Introducing u Take and extra M points on the function, u = f ( Z ). p ( y , f , u ) = p ( y | f ) p ( f | u ) p ( u ) � � y | f , σ 2 I p ( y | f ) = N � � f | K nm K mm ı u , � p ( f | u ) = N K p ( u ) = N ( u | 0 , K mm )
s e u l a v n X , y o i t c n u f ( x ) ∼ G P f p ( f ) = N ( 0 , K nn ) p ( f | y , X ) Z , u p ( u ) = N ( 0 , K mm ) input space (X)
s e u l a v n X , y o i t c n u f ( x ) ∼ G P f p ( f ) = N ( 0 , K nn ) p ( f | y , X ) p ( u ) = N ( 0 , K mm ) � p ( u | y , X ) input space (X)
The alternative posterior Instead of doing p ( y | f ) p ( f | X ) � p ( f | y , X ) = p ( y | f ) p ( f | X )d f We’ll do p ( y | u ) p ( u | Z ) � p ( u | y , Z ) = p ( y | u ) p ( u | Z )d u
The alternative posterior Instead of doing p ( y | f ) p ( f | X ) � p ( f | y , X ) = p ( y | f ) p ( f | X )d f We’ll do p ( y | u ) p ( u | Z ) � p ( u | y , Z ) = p ( y | u ) p ( u | Z )d u but p ( y | u ) involves inverting K nn
Variational marginalisation of f � ln p ( y | u ) = ln p ( y | f ) p ( f | u , X )d f
Variational marginalisation of f � ln p ( y | u ) = ln p ( y | f ) p ( f | u , X )d f � p ( y | f ) � ln p ( y | u ) = ln E p ( f | u , X )
Variational marginalisation of f � ln p ( y | u ) = ln p ( y | f ) p ( f | u , X )d f � p ( y | f ) � ln p ( y | u ) = ln E p ( f | u , X ) � ln p ( y | f ) � � ln � ln p ( y | u ) ≥ E p ( f | u , X ) p ( y | u )
Variational marginalisation of f � ln p ( y | u ) = ln p ( y | f ) p ( f | u , X )d f � p ( y | f ) � ln p ( y | u ) = ln E p ( f | u , X ) � ln p ( y | f ) � � ln � ln p ( y | u ) ≥ E p ( f | u , X ) p ( y | u ) No inversion of K nn required
An approximate likelihood � n � mm u , σ 2 � � � �� y i | k ⊤ mn K − 1 − 1 k nn − k ⊤ mn K − 1 � p ( y | u ) = N exp mm k mn 2 σ 2 i = 1 A straightforward likelihood approximation, and a penalty term
Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples
log p ( y | X ) ≥ � L 1 + log p ( u ) − log q ( u ) � q ( u ) � L 3 . (1) � � n � mm m , β − 1 � y i | k ⊤ mn K − 1 L 3 = log N i = 1 � − 1 k i , i − 1 2 β � 2tr ( S Λ i ) − KL � q ( u ) � p ( u ) � (2)
Optimisation The variational objective L 3 is a function of ◮ the parameters of the covariance function ◮ the parameters of q ( u ) ◮ the inducing inputs, Z Strategy: set Z . Take the data in small minibatches, take stochastic gradient steps in the covariance function parameters, stochastic natural gradient steps in the parameters of q ( u ).
Overview Motivation Sparse Gaussian Processes Stochastic Variational Inference Examples
UK apartment prices ◮ Monthly price paid data for February to October 2012 (England and Wales) ◮ from http://data.gov.uk/dataset/ land-registry-monthly-price-paid-data/ ◮ 75,000 entries ◮ Cross referenced against a postcode database to get lattitude and longitude ◮ Regressed the normalised logarithm of the apartment prices
Airline data ◮ Flight delays for every 0.9 0.8 commercial flight in the 0.7 USA from January to April Inverse lengthscale 0.6 0.5 2008. 0.4 ◮ Average delay was 30 0.3 0.2 minutes. 0.1 ◮ We randomly selected 0.0 Month DayOfMonth DayOfWeek DepTime ArrTime AirTime Distance PlaneAge 800,000 datapoints (we have limited memory!) ◮ 700,000 train, 100,000 test
GPs on subsets SVI GP 37 37 36 36 35 35 RMSE 34 34 33 33 32 32 N=800 N=1000 N=1200 0 200 400 600 800 1000 1200 iteration
Download the code! github.com/SheffieldML/GPy Cite our paper! Hensman, Fusi and Lawrence, Gaussian Processes for Big Data Proceedings of UAI 2013
Recommend
More recommend