Prior Knowledge and Sparse Methods for Convolved Multiple Outputs Gaussian Processes Mauricio A. Álvarez Joint work with Neil D. Lawrence, David Luengo and Michalis K. Titsias School of Computer Science University of Manchester (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 1 / 51
Contents Latent force models. ❑ Sparse approximations for latent force models. ❑ (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 2 / 51
Data driven paradigm Traditionally, the main focus in machine learning has been model ❑ generation through a data driven paradigm . Combine a data set with a flexible class of models and, through ❑ regularization, make predictions on unseen data. Problems ❑ – Data is scarce relative to the complexity of the system. – Model is forced to extrapolate. (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 3 / 51
Mechanistic models Models inspired by the underlying knowledge of a physical system are ❑ common in many areas. Description of a well characterized physical process that underpins the ❑ system, typically represented with a set of differential equations. Identifying and specifying all the interactions might not be feasible. ❑ A mechanistic model can enable accurate prediction in regions where ❑ there may be no available training data (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 4 / 51
Hybrid systems We suggest a hybrid approach involving a mechanistic model of the ❑ system augmented through machine learning techniques. Dynamical systems (e.g. incorporating first order and second order ❑ differential equations). Partial differential equations for systems with multiple inputs. ❑ (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 5 / 51
Latent variable model: definition Our approach can be seen as a type of latent variable model. ❑ Y = UW + E , where Y ∈ R N × D , U ∈ R N × Q , W ∈ R Q × D ( Q < D ) and E is a matrix variate white Gaussian noise with columns e : , d ∼ N ( 0 , Σ ) . In PCA and FA the common approach to deal with the unknowns is to ❑ integrate out U under a Gaussian prior and optimize with respect to W . (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 6 / 51
Latent variable model: alternative view Data with temporal nature and Gaussian (Markov) prior for rows of U ❑ leads to the Kalman filter/smoother. Consider a joint distribution for p ( U | t ) , t = [ t 1 . . . t N ] ⊤ , with the form of a ❑ Gaussian process (GP), Q � � � p ( U | t ) = N u : , q | 0 , K u : , q , u : , q . q = 1 The latent variables are random functions, { u q ( t ) } Q q = 1 with associated covariance K u : , q , u : , q . The GP for Y can be readily implemented. In [TSJ05] this is known as a ❑ semi-parametric latent factor model (SLFM). (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 7 / 51
Latent force model: mechanistic interpretation (1) We include a further dynamical system with a mechanistic inspiration. ❑ Reinterpret equation Y = UW + E , as a force balance equation ❑ YB = US + � E , where S ∈ R Q × D is a matrix of sensitivities, B ∈ R D × D is diagonal matrix � � of spring constants, W = SB − 1 and � 0 , B ⊤ Σ B . e : , d ∼ N (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 8 / 51
Latent force model: mechanistic interpretation (2) y d ( t ) B d U ( t ) (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 9 / 51
Latent force model: mechanistic interpretation (2) S d 1 u 1 ( t ) S d 2 u 2 ( t ) U ( t ) S dQ u Q ( t ) (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 9 / 51
Latent force model: mechanistic interpretation (2) S d 1 u 1 ( t ) y d ( t ) B d S d 2 u 2 ( t ) U ( t ) S dQ u Q ( t ) YB = US + � E (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 9 / 51
Latent force model: extension (1) The model can be extended including dampers and masses. ❑ We can write ❑ YB + ˙ YC + ¨ YM = US + � E , where Y is the first derivative of Y w.r.t. time ˙ Y is the second derivative of Y w.r.t. time ¨ C is a diagonal matrix of damping coefficients M is a diagonal matrix of masses � E is a matrix variate white Gaussian noise. (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 10 / 51
Latent force model: extension (2) y d ( t ) B d U ( t ) m d C d (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 11 / 51
Latent force model: extension (2) S d 1 u 1 ( t ) S d 2 u 2 ( t ) U ( t ) S dQ u Q ( t ) (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 11 / 51
Latent force model: extension (2) S d 1 u 1 ( t ) y d ( t ) B d S d 2 u 2 ( t ) U ( t ) m d C d S dQ u Q ( t ) YB + ˙ YC + ¨ YM = US + � E (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 11 / 51
Latent force model: properties This model allows to include behaviors like inertia and resonance. ❑ We refer to these systems as latent force models (LFMs). ❑ One way of thinking of our model is to consider puppetry. ❑ (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 12 / 51
Second Order Dynamical System Using the system of second order differential equations Q d 2 y d ( t ) � d y d ( t ) m d + C d + B d y d ( t ) = S dq u q ( t ) , d t 2 d t q = 1 where u q ( t ) latent forces y d ( t ) displacements over time C d damper constant for the d -th output B d spring constant for the d -th output m d mass constant for the d -th output S dq sensitivity of the d -th output to the q -th input. (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 13 / 51
Second Order Dynamical System: solution Solving for y d ( t ) , we obtain Q � y d ( t ) = L dq [ u q ]( t ) , q = 1 where the linear operator is given by a convolution: � t L dq [ u q ]( t ) = S dq exp ( − α d ( t − τ )) sin ( ω d ( t − τ )) u q ( τ ) d τ, ω d 0 � with ω d = 4 B d − C 2 d / 2 and α d = C d / 2. (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 14 / 51
Second Order Dynamical System: covariance matrix Behaviour of the system summarized by the damping ratio: ζ d = 1 � 2 C d / B d ζ d > 1 overdamped system ζ d = 1 critically damped system ζ d < 1 underdamped system ζ d = 0 undamped system (no friction) f(t) 0.8 Example covariance matrix: 0.6 y 1 (t) 0.4 ζ 1 = 0 . 125 underdamped 0.2 ζ 2 = 2 overdamped y 2 (t) 0 ζ 3 = 1 critically damped −0.2 y 3 (t) −0.4 f(t) y 1 (t) y 2 (t) y 3 (t) (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 15 / 51
Second Order Dynamical System: samples from GP 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2.5 0 5 10 15 20 Joint samples from the ODE covariance, cyan : u ( t ) , red : y 1 ( t ) (underdamped) and green : y 2 ( t ) (overdamped) and blue : y 3 ( t ) (critically damped). (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 16 / 51
Second Order Dynamical System: samples from GP 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2.5 0 5 10 15 20 Joint samples from the ODE covariance, cyan : u ( t ) , red : y 1 ( t ) (underdamped) and green : y 2 ( t ) (overdamped) and blue : y 3 ( t ) (critically damped). (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 16 / 51
Second Order Dynamical System: samples from GP 2 1.5 1 0.5 0 −0.5 −1 0 5 10 15 20 Joint samples from the ODE covariance, cyan : u ( t ) , red : y 1 ( t ) (underdamped) and green : y 2 ( t ) (overdamped) and blue : y 3 ( t ) (critically damped). (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 16 / 51
Second Order Dynamical System: samples from GP 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2.5 0 5 10 15 20 Joint samples from the ODE covariance, cyan : u ( t ) , red : y 1 ( t ) (underdamped) and green : y 2 ( t ) (overdamped) and blue : y 3 ( t ) (critically damped). (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 16 / 51
Motion Capture Data (1) CMU motion capture data, motions 18, 19 and 20 from subject 49. ❑ Motions 18 and 19 for training and 20 for testing. ❑ (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 17 / 51
Motion Capture Data (2) The data down-sampled by 32 (from 120 frames per second to 3.75). ❑ We focused on the subject’s left arm. ❑ For testing, we condition only on the observations of the shoulder’s ❑ orientation (motion 20) to make predictions for the rest of the arm’s angles. (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 18 / 51
Motion Capture Results Root mean squared (RMS) angle error for prediction of the left arm’s configuration in the motion capture data. Prediction with the latent force model outperforms the prediction with regression for all apart from the radius’s angle. Latent Force Regression Angle Error Error Radius 4.11 4.02 Wrist 6.65 6.55 Hand X rotation 3.21 1.82 Hand Z rotation 6.14 2.76 Thumb X rotation 3.10 1.77 Thumb Z rotation 6.09 2.73 (University of Manchester) Prior Knowledge and Sparse Methods 12/12/2009 19 / 51
Recommend
More recommend