Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } f 1 ∼ N ( 0 , K 1 ) f 2 ∼ N ( 0 , K 2 ) � f 1 � �� 0 � � K 1 �� 0 ∼ N , f 2 0 0 K 2 f 6 / 76
Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } f 1 ∼ N ( 0 , K 1 ) f 2 ∼ N ( 0 , K 2 ) � f 1 � �� 0 � � K 1 �� 0 ∼ N , f 2 0 0 K 2 f K f , f 6 / 76
Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } f 1 ∼ N ( 0 , K 1 ) f 2 ∼ N ( 0 , K 2 ) � f 1 � �� 0 � � K 1 �� 0 ∼ N , f 2 0 0 K 2 f 0 K f , f 6 / 76
Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) 6 / 76
Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , y 1 ( x i , 2 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , y 2 ( x i , 2 )) | i = 1 , . . . , N 2 } 6 / 76
Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , y 1 ( x i , 2 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , y 2 ( x i , 2 )) | i = 1 , . . . , N 2 } y 1 ∼ N ( 0 , K 1 + σ 2 y 2 ∼ N ( 0 , K 2 + σ 2 1 I ) 2 I ) 6 / 76
Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , y 1 ( x i , 2 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , y 2 ( x i , 2 )) | i = 1 , . . . , N 2 } y 1 ∼ N ( 0 , K 1 + σ 2 y 2 ∼ N ( 0 , K 2 + σ 2 1 I ) 2 I ) � y 1 � �� 0 � � K 1 � � σ 2 �� 0 1 I 0 ∼ N + , σ 2 y 2 0 0 K 2 0 2 I 6 / 76
Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , y 1 ( x i , 2 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , y 2 ( x i , 2 )) | i = 1 , . . . , N 2 } y 1 ∼ N ( 0 , K 1 + σ 2 y 2 ∼ N ( 0 , K 2 + σ 2 1 I ) 2 I ) � y 1 � �� 0 � � K 1 � � σ 2 �� 0 1 I 0 ∼ N + , σ 2 y 2 0 0 K 2 0 2 I y 6 / 76
Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , y 1 ( x i , 2 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , y 2 ( x i , 2 )) | i = 1 , . . . , N 2 } y 1 ∼ N ( 0 , K 1 + σ 2 y 2 ∼ N ( 0 , K 2 + σ 2 1 I ) 2 I ) � y 1 � �� 0 � � K 1 � � σ 2 �� 0 1 I 0 ∼ N + , σ 2 y 2 0 0 K 2 0 2 I y K f , f 6 / 76
Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , y 1 ( x i , 2 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , y 2 ( x i , 2 )) | i = 1 , . . . , N 2 } y 1 ∼ N ( 0 , K 1 + σ 2 y 2 ∼ N ( 0 , K 2 + σ 2 1 I ) 2 I ) � y 1 � �� 0 � � K 1 � � σ 2 �� 0 1 I 0 ∼ N + , σ 2 y 2 0 0 K 2 0 2 I y + Σ K f , f 6 / 76
Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , y 1 ( x i , 2 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , y 2 ( x i , 2 )) | i = 1 , . . . , N 2 } y 1 ∼ N ( 0 , K 1 + σ 2 y 2 ∼ N ( 0 , K 2 + σ 2 1 I ) 2 I ) � y 1 � �� 0 � � K 1 � � σ 2 �� 0 1 I 0 ∼ N + , σ 2 y 2 0 0 K 2 0 2 I y + Σ 0 K f , f 6 / 76
Kernels for multiple outputs f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } 7 / 76
Kernels for multiple outputs f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } � f 1 � �� 0 � � K 1 �� 0 ∼ N , f 2 0 0 K 2 0 K f , f f 7 / 76
Kernels for multiple outputs f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } � � K 1 0 K f , f = 0 K 2 7 / 76
Kernels for multiple outputs f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } � � K 1 ? K f , f = ? K 2 7 / 76
Kernels for multiple outputs f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } � � Build a cross-covariance K 1 ? K f , f = function cov [ f 1 ( x ) , f 2 ( x ′ )] such ? K 2 that K f , f is positive semi-definite. 7 / 76
Different input configurations of the data Isotopic data Inputs for f 1 ( x ) Inputs for f 2 ( x ) Sample sites are shared 8 / 76
Different input configurations of the data Isotopic data Inputs for f 1 ( x ) Inputs for f 2 ( x ) Sample sites are shared D 1 = { ( x i , f 1 ( x i )) N i = 1 } D 2 = { ( x i , f 2 ( x i )) N i = 1 } 8 / 76
Different input configurations of the data Isotopic data Heterotopic data Inputs for f 1 ( x ) Inputs for f 2 ( x ) Sample sites are shared Sample sites may be different D 1 = { ( x i , f 1 ( x i )) N i = 1 } D 2 = { ( x i , f 2 ( x i )) N i = 1 } 8 / 76
Different input configurations of the data Isotopic data Heterotopic data Inputs for f 1 ( x ) Inputs for f 2 ( x ) Sample sites are shared Sample sites may be different D 1 = { ( x i , f 1 ( x i )) N D 1 = { ( x i , 1 , f 1 ( x i , 1 )) N 1 i = 1 } i = 1 } D 2 = { ( x i , f 2 ( x i )) N D 2 = { ( x i , 2 , f 2 ( x i , 2 )) N 2 i = 1 } i = 1 } 8 / 76
Contents Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary 9 / 76
Intrinsic coregionalization model (ICM): two outputs Consider two outputs f 1 ( x ) and f 2 ( x ) with x ∈ R p . ❑ We assume the following generative model for the outputs ❑ 1. Sample from a GP u ( x ) ∼ GP ( 0 , k ( x , x ′ )) to obtain u 1 ( x ) 2. Obtain f 1 ( x ) and f 2 ( x ) by linearly transforming u 1 ( x ) f 1 ( x ) = a 1 1 u 1 ( x ) f 2 ( x ) = a 1 2 u 1 ( x ) 10 / 76
ICM: samples 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 -1.2 0 0.2 0.4 0.6 0.8 1 11 / 76
ICM: samples 2 1 0 -1 0.4 -2 0.2 -3 0 -4 -0.2 -5 -0.4 -6 0 0.2 0.4 0.6 0.8 1 -0.6 -0.8 -1 -1.2 0 0.2 0.4 0.6 0.8 1 11 / 76
ICM: samples 2 1 0 -1 0.4 -2 0.2 -3 0 -4 -0.2 -5 -0.4 -6 0 0.2 0.4 0.6 0.8 1 -0.6 0.5 -0.8 -1 0 -1.2 0 0.2 0.4 0.6 0.8 1 -0.5 -1 -1.5 -2 0 0.2 0.4 0.6 0.8 1 11 / 76
ICM: samples 1 0.9 0.8 0.7 0.6 0.5 0 0.2 0.4 0.6 0.8 1 11 / 76
ICM: samples 5 4.5 4 1 3.5 0.9 3 0.8 2.5 0 0.2 0.4 0.6 0.8 1 0.7 0.6 0.5 0 0.2 0.4 0.6 0.8 1 11 / 76
ICM: samples 5 4.5 4 1 3.5 0.9 3 0.8 2.5 0 0.2 0.4 0.6 0.8 1 0.7 1.5 0.6 1.4 1.3 0.5 0 0.2 0.4 0.6 0.8 1 1.2 1.1 1 0.9 0.8 0.7 0 0.2 0.4 0.6 0.8 1 11 / 76
ICM: covariance (I) For a fixed value of x , we can group f 1 ( x ) and f 2 ( x ) in a vector f ( x ) ❑ � f 1 ( x ) � f ( x ) = f 2 ( x ) We refer to this vector as a vector-valued function . ❑ The covariance for f ( x ) is computed as ❑ − E { f ( x ) } [ E { f ( x ′ ) } ] ⊤ . � f ( x )[ f ( x ′ )] ⊤ � cov ( f ( x ) , f ( x ′ )) = E � f ( x )[ f ( x ′ )] ⊤ � We compute first the term E ❑ �� f 1 ( x ) � � f 1 ( x ′ ) f 2 ( x ′ ) �� � E { f 1 ( x ) f 1 ( x ′ ) } � E { f 1 ( x ) f 2 ( x ′ ) } = E E { f 2 ( x ) f 1 ( x ′ ) } E { f 2 ( x ) f 2 ( x ′ ) } f 2 ( x ) 12 / 76
ICM: covariance (II) We compute the expected values as ❑ � � � � a 1 1 u 1 ( x ) a 1 1 u 1 ( x ′ ) = ( a 1 1 ) 2 E u 1 ( x ) u 1 ( x ′ ) E { f 1 ( x ) f 1 ( x ′ ) } = E � � � � E { f 1 ( x ) f 2 ( x ′ ) } = E a 1 1 u 1 ( x ) a 1 2 ( x ′ ) = a 1 1 a 1 u 1 ( x ) u 1 ( x ′ ) 2 E � � � � E { f 2 ( x ) f 2 ( x ′ ) } = E a 1 2 u 1 ( x ) a 1 2 u 1 ( x ′ ) = ( a 1 2 ) 2 E u 1 ( x ) u 1 ( x ′ ) � f ( x )[ f ( x ′ )] ⊤ � The term E follows as ❑ � � � � � � ( a 1 1 ) 2 E u 1 ( x ) u 1 ( x ′ ) a 1 1 a 1 u 1 ( x ) u 1 ( x ′ ) 2 E � f ( x )[ f ( x ′ )] ⊤ � = E � � � � a 1 a 2 E u 1 ( x ) u 1 ( x ′ ) ( a 1 2 ) 2 E u 1 ( x ) u 1 ( x ′ ) � � ( a 1 1 ) 2 a 1 1 a 1 � � 2 u 1 ( x ) u 1 ( x ′ ) = E a 1 1 a 1 ( a 1 2 ) 2 2 The term E { f ( x ) } is computed as ❑ �� �� � � � � � � � � a 1 1 u 1 ( x ) a 1 f 1 ( x ) E { f 1 ( x ) } E � � u 1 ( x ) 1 = = = E E � � a 1 2 u 1 ( x ) a 1 f 2 ( x ) E { f 2 ( x ) } E 2 13 / 76
ICM: covariance (III) Putting the terms together, the covariance for f ( x ′ ) follows as ❑ � ( a 1 � � a 1 � � 1 ) 2 a 1 1 a 1 � � � � � � � u 1 ( x ) u 1 ( x ′ ) a 1 a 1 u 1 ( x ) u 1 ( x ′ ) 2 − 1 E E E a 1 1 a 1 ( a 1 2 ) 2 a 1 1 2 2 2 Defining a = [ a 1 1 a 1 2 ] ⊤ , ❑ � � � � � � cov ( f ( x ) , f ( x ′ )) = aa ⊤ E u 1 ( x ) u 1 ( x ′ ) − aa ⊤ E u 1 ( x ) u 1 ( x ′ ) E = aa ⊤ � � � � � � �� u 1 ( x ) u 1 ( x ′ ) u 1 ( x ) u 1 ( x ′ ) − E E E � �� � k ( x , x ′ ) = aa ⊤ k ( x , x ′ ) We define B = aa ⊤ , leading to ❑ � b 11 � b 12 cov ( f ( x ) , f ( x ′ )) = B k ( x , x ′ ) = k ( x , x ′ ) b 21 b 22 Notice that B has rank one. ❑ 14 / 76
ICM: two outputs and two latent samples We can introduce a bit more of complexity in the model before as ❑ follows. Consider again two outputs f 1 ( x ) and f 2 ( x ) with x ∈ R p . ❑ We assume the following generative model for the outputs ❑ 1. Sample twice from a GP u ( x ) ∼ GP ( 0 , k ( x , x ′ )) to obtain u 1 ( x ) and u 2 ( x ) 2. Obtain f 1 ( x ) and f 2 ( x ) by adding a scaled transformation of u 1 ( x ) and u 2 ( x ) f 1 ( x ) = a 1 1 u 1 ( x ) + a 2 1 u 2 ( x ) f 2 ( x ) = a 1 2 u 1 ( x ) + a 2 2 u 2 ( x ) Notice that u 1 ( x ) and u 2 ( x ) are independent, although they share the ❑ same covariance k ( x , x ′ ) . 15 / 76
ICM: samples 3 2 1 0 -1 -2 0 0.2 0.4 0.6 0.8 1 1.5 1 0.5 0 -0.5 -1 -1.5 0 0.2 0.4 0.6 0.8 1 16 / 76
ICM: samples 3 10 10 2 5 5 1 0 0 0 -5 -5 -1 -2 -10 -10 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 1.5 1 0.5 0 -0.5 -1 -1.5 0 0.2 0.4 0.6 0.8 1 16 / 76
ICM: samples 3 10 10 2 5 5 1 0 0 0 -5 -5 -1 -2 -10 -10 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 1.5 5 5 1 0.5 0 0 0 -0.5 -1 -1.5 -5 -5 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 16 / 76
ICM: samples 1 0.5 0 -0.5 -1 -1.5 -2 -2.5 0 0.2 0.4 0.6 0.8 1 2.5 2 1.5 1 0.5 0 -0.5 -1 0 0.2 0.4 0.6 0.8 1 16 / 76
ICM: samples 1 10 10 0.5 5 5 0 -0.5 0 0 -1 -1.5 -5 -5 -2 -2.5 -10 -10 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 2.5 2 1.5 1 0.5 0 -0.5 -1 0 0.2 0.4 0.6 0.8 1 16 / 76
ICM: samples 1 10 10 0.5 5 5 0 -0.5 0 0 -1 -1.5 -5 -5 -2 -2.5 -10 -10 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 2.5 2 2 1 1 2 0 0 1.5 -1 -1 1 -2 -2 0.5 -3 -3 0 -4 -4 -0.5 -5 -5 -1 -6 -6 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 16 / 76
ICM: covariance The vector-valued function can be written as f ( x ) ❑ f ( x ) = a 1 u 1 ( x ) + a 2 u 2 ( x ) where a 1 = [ a 1 2 ] ⊤ and a 2 = [ a 2 1 a 1 1 a 2 2 ] ⊤ . The covariance for f ( x ) is computed as ❑ cov ( f ( x ) , f ( x ′ )) = a 1 ( a 1 ) ⊤ cov ( u 1 ( x ) , u 1 ( x ′ )) + a 2 ( a 2 ) ⊤ cov ( u 2 ( x ) , u 2 ( x ′ )) = a 1 ( a 1 ) ⊤ k ( x , x ′ ) + a 2 ( a 2 ) ⊤ k ( x , x ′ ) � a 1 ( a 1 ) ⊤ + a 2 ( a 2 ) ⊤ � k ( x , x ′ ) = We define B = a 1 ( a 1 ) ⊤ + a 2 ( a 2 ) ⊤ , leading to ❑ � � b 11 b 12 cov ( f ( x ) , f ( x ′ )) = B k ( x , x ′ ) = k ( x , x ′ ) b 21 b 22 Notice that B has rank two. ❑ 17 / 76
ICM: observed data 10 2 1 5 0 -1 0 -2 -3 -5 -4 -5 -10 -6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 18 / 76
ICM: observed data 10 2 1 5 0 -1 0 -2 -3 -5 -4 -5 -10 -6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N } 18 / 76
ICM: observed data 10 2 1 5 0 -1 0 -2 -3 -5 -4 -5 -10 -6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N } f 1 ( x 1 ) . . . � � �� � � �� f 1 f 1 ( x N ) 0 b 11 K b 12 K = ∼ N , f 2 f 2 ( x 1 ) 0 b 21 K b 22 K . . . f 2 ( x N ) 18 / 76
ICM: observed data 10 2 1 5 0 -1 0 -2 -3 -5 -4 -5 -10 -6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N } f 1 ( x 1 ) . . . � � �� � � �� The matrix K ∈ R N × N has f 1 f 1 ( x N ) 0 b 11 K b 12 K = ∼ N , f 2 f 2 ( x 1 ) 0 b 21 K b 22 K elements k ( x i , x j ) . . . . f 2 ( x N ) 18 / 76
ICM: observed data 10 2 1 5 0 -1 0 -2 -3 -5 -4 -5 -10 -6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N } The Kronecker product between matrices C ∈ R c 1 × c 2 and G ∈ R g 1 × g 2 with c 1 , 1 · · · c 1 , c 2 c 1 , 1 G · · · c 1 , c 2 G . . . . . . C = . . . is C ⊗ G = . . . . . . . . . · · · · · · c c 1 , 1 c c 1 , c 2 c c 1 , 1 G c c 1 , c 2 G 18 / 76
ICM: observed data 10 2 1 5 0 -1 0 -2 -3 -5 -4 -5 -10 -6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N } f 1 ( x 1 ) . . . � � �� � � f 1 f 1 ( x N ) 0 = ∼ N , B ⊗ K f 2 f 2 ( x 1 ) 0 . . . f 2 ( x N ) 18 / 76
ICM: observed data 10 2 1 5 0 -1 0 -2 -3 -5 -4 -5 -10 -6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N } f 1 ( x 1 ) . . . � � �� � � The matrix K ∈ R N × N has f 1 f 1 ( x N ) 0 = ∼ N , B ⊗ K f 2 f 2 ( x 1 ) 0 elements k ( x i , x j ) . . . . f 2 ( x N ) 18 / 76
ICM: general case Consider a set of functions { f d ( x ) } D d = 1 . ❑ In the ICM ❑ R � a i d u i ( x ) , f d ( x ) = i = 1 where the functions u i ( x ) are GPs sampled independently, and share the same covariance function k ( x , x ′ ) . For f ( x ) = [ f 1 ( x ) · · · f D ( x )] ⊤ , the covariance cov [ f ( x ) , f ( x ′ )] is given as ❑ cov [ f ( x ) , f ( x ′ )] = AA ⊤ k ( x , x ′ ) = B k ( x , x ′ ) , where A = [ a 1 a 2 · · · a R ] . The rank of B ∈ R D × D is given by R . ❑ 19 / 76
ICM: autokrigeability If the outputs are considered to be noise-free, prediction using the ICM ❑ under an isotopic data case is equivalent to independent prediction over each output. This circumstance is also known as autokrigeability. ❑ 20 / 76
Contents Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary 21 / 76
Semiparametric Latent Factor Model (SLFM) ICM uses R samples u i ( x ) from u ( x ) with the same covariance ❑ function. SLFM uses Q samples from u q ( x ) processes with different covariance ❑ functions. The SLFM with Q = 1 is the same to the ICM with R = 1. ❑ Consider two outputs f 1 ( x ) and f 2 ( x ) with x ∈ R p . ❑ Suppose we have Q = 2. ❑ We assume the following generative model for the outputs ❑ 1. Sample from a GP GP ( 0 , k 1 ( x , x ′ )) to obtain u 1 ( x ) . 2. Sample from a GP GP ( 0 , k 2 ( x , x ′ )) to obtain u 2 ( x ) . 3. Obtain f 1 ( x ) and f 2 ( x ) by adding a scaled versions of u 1 ( x ) and u 2 ( x ) f 1 ( x ) = a 1 , 1 u 1 ( x ) + a 1 , 2 u 2 ( x ) f 2 ( x ) = a 2 , 1 u 1 ( x ) + a 2 , 2 u 2 ( x ) 22 / 76
SLFM: samples 0 -0.5 -1 -1.5 -2 -2.5 0 0.2 0.4 0.6 0.8 1 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -2.5 0 0.2 0.4 0.6 0.8 1 23 / 76
SLFM: samples 0 0 0 -2 -2 -0.5 -4 -4 -1 -6 -6 -1.5 -8 -8 -2 -10 -10 -2.5 -12 -12 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -2.5 0 0.2 0.4 0.6 0.8 1 23 / 76
SLFM: samples 0 0 0 -2 -2 -0.5 -4 -4 -1 -6 -6 -1.5 -8 -8 -2 -10 -10 -2.5 -12 -12 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 1.5 4 4 1 2 2 0.5 0 0 0 -0.5 -2 -2 -1 -1.5 -4 -4 -2 -2.5 -6 -6 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 23 / 76
SLFM: samples 0 -0.5 -1 -1.5 -2 -2.5 0 0.2 0.4 0.6 0.8 1 3 2 1 0 -1 -2 0 0.2 0.4 0.6 0.8 1 23 / 76
SLFM: samples 0 0 0 -2 -2 -0.5 -4 -4 -1 -6 -6 -1.5 -8 -8 -2 -10 -10 -2.5 -12 -12 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 3 2 1 0 -1 -2 0 0.2 0.4 0.6 0.8 1 23 / 76
SLFM: samples 0 0 0 -2 -2 -0.5 -4 -4 -1 -6 -6 -1.5 -8 -8 -2 -10 -10 -2.5 -12 -12 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 3 2 2 2 0 0 1 -2 -2 0 -4 -4 -1 -6 -6 -2 -8 -8 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 23 / 76
SLFM: covariance The vector-valued function can be written as f ( x ) ❑ f ( x ) = a 1 u 1 ( x ) + a 2 u 2 ( x ) where a 1 = [ a 1 , 1 a 2 , 1 ] ⊤ and a 2 = [ a 1 , 2 a 2 , 2 ] ⊤ . The covariance for f ( x ) is computed as ❑ cov ( f ( x ) , f ( x ′ )) = a 1 ( a 1 ) ⊤ cov ( u 1 ( x ) , u 1 ( x ′ )) + a 2 ( a 2 ) ⊤ cov ( u 2 ( x ) , u 2 ( x ′ )) = a 1 ( a 1 ) ⊤ k 1 ( x , x ′ ) + a 2 ( a 2 ) ⊤ k 2 ( x , x ′ ) We define B 1 = a 1 ( a 1 ) ⊤ and B 2 = a 2 ( a 2 ) ⊤ , leading to ❑ cov ( f ( x ) , f ( x ′ )) = B 1 k 1 ( x , x ′ ) + B 2 k 2 ( x , x ′ ) Notice that B 1 and B 2 have rank one. ❑ 24 / 76
SLFM: observed data 0 2 -2 0 -4 -2 -6 -4 -8 -6 -10 -12 -8 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 25 / 76
SLFM: observed data 0 2 -2 0 -4 -2 -6 -4 -8 -6 -10 -12 -8 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N } 25 / 76
SLFM: observed data 0 2 -2 0 -4 -2 -6 -4 -8 -6 -10 -12 -8 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N } f 1 ( x 1 ) . . . � � �� � � f 1 f 1 ( x N ) 0 = ∼ N , B 1 ⊗ K 1 + B 2 ⊗ K 2 f 2 f 2 ( x 1 ) 0 . . . f 2 ( x N ) 25 / 76
SLFM: observed data 0 2 -2 0 -4 -2 -6 -4 -8 -6 -10 -12 -8 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N } f 1 ( x 1 ) The matrix K 1 ∈ R N × N has . . . elements k 1 ( x i , x j ) . � � �� � � f 1 f 1 ( x N ) 0 = ∼ N , B 1 ⊗ K 1 + B 2 ⊗ K 2 f 2 f 2 ( x 1 ) 0 . . . f 2 ( x N ) 25 / 76
SLFM: observed data 0 2 -2 0 -4 -2 -6 -4 -8 -6 -10 -12 -8 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N } f 1 ( x 1 ) The matrix K 1 ∈ R N × N has . . . elements k 1 ( x i , x j ) . � � �� � � f 1 f 1 ( x N ) 0 = ∼ N , B 1 ⊗ K 1 + B 2 ⊗ K 2 f 2 f 2 ( x 1 ) 0 The matrix K 2 ∈ R N × N has . . elements k 2 ( x i , x j ) . . f 2 ( x N ) 25 / 76
SLFM: general case Consider a set of functions { f d ( x ) } D d = 1 . ❑ In the SLFM ❑ Q � f d ( x ) = a d , q u q ( x ) , q = 1 where the functions u q ( x ) are GPs with covariance functions k q ( x , x ′ ) . For f ( x ) = [ f 1 ( x ) · · · f D ( x )] ⊤ , the covariance cov [ f ( x ) , f ( x ′ )] is given as ❑ Q Q � � cov [ f ( x ) , f ( x ′ )] = A q A ⊤ q k q ( x , x ′ ) = B q k q ( x , x ′ ) , q = 1 q = 1 where A q = a q . The rank of each B q ∈ R D × D is one. ❑ 26 / 76
Contents Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary 27 / 76
Linear model of coregionalization (LMC) The LMC generalizes the ICM and the SLFM allowing several ❑ independent samples from GPs with different covariances. Consider a set of functions { f d ( x ) } D d = 1 . ❑ In the LMC ❑ R q Q � � a i d , q u i f d ( x ) = q ( x ) , q = 1 i = 1 where the functions u i q ( x ) are GPs with zero means and covariance functions q ( x ) , u i ′ cov [ u i q ′ ( x ′ )] = k q ( x , x ′ ) , if i = i ′ and q = q ′ . 28 / 76
Recommend
More recommend