Bayesian inference & process convolution models Dave Higdon, Statistical Sciences Group, LANL 1
MOVING AVERAGE SPATIAL MODELS 2
Kernel basis representation for spatial processes z ( s ) Define m basis functions k 1 ( s ) , . . . , k m ( s ) . 0.8 basis 0.4 0.0 � 2 0 2 4 6 8 10 12 3 s Here k j ( s ) is normal density cetered at spatial location ω j : 1 2 π exp { − 1 2( s − ω j ) 2 } √ k j ( s ) = m set z ( s ) = j =1 k j ( s ) x j where x ∼ N (0 , I m ) . � Can represent z = ( z ( s 1 ) , . . . , z ( s n )) T as z = Kx where K ij = k j ( s i )
x and k ( s ) determine spatial processes z ( s ) k j ( s ) x j z ( s ) 1.5 1.5 basis z(s) 0.5 0.5 � 0.5 � 0.5 � 2 0 2 4 6 8 10 12 � 2 0 2 4 6 8 10 12 s s 4 Continuous representation: m z ( s ) = j =1 k j ( s ) x j where x ∼ N (0 , I m ) . � Discrete representation: For z = ( z ( s 1 ) , . . . , z ( s n )) T , z = Kx where K ij = k j ( s i )
Estimate z ( s ) by specifying k j ( s ) and estimating x k j ( s ) x j z ( s ) 0.5 1.0 0.0 0.5 � 0.5 fitted bases y(s) 0.0 � 1.0 � 0.5 � 1.5 � 2.0 � 1.0 5 � 2 0 2 4 6 8 10 12 � 2 0 2 4 6 8 10 12 s s Discrete representation: For z = ( z ( s 1 ) , . . . , z ( s n )) T , z = Kx where K ij = k j ( s i ) ⇒ standard regression model: y = Kx + �
Formulation for the 1-d example Data y = ( y ( s 1 ) , . . . , y ( s n )) T observed at locations s 1 , . . . , s n . Once knot locations ω j , j = 1 , . . . , m and kernel choice k ( s ) are specified, the remaining model formulation is trivial: Likelihood: − 1 n 2 λ y ( y − Kx ) T ( y − Kx ) L ( y | x, λ y ) ∝ λ y exp 2 where K ij = k ( ω j − s i ) . Priors: 6 − 1 m 2 λ x x T x π ( x | λ x ) ∝ λ x exp 2 π ( λ x ) ∝ λ a x − 1 exp { − b x λ x } x π ( λ y ) ∝ λ a y − 1 exp { − b y λ y } y Posterior: π ( x, λ x , λ y | y ) ∝ λ a y + n 2 − 1 − λ y [ b y + . 5( y − Kx ) T ( y − Kx )] � � exp × y λ a x + m 2 − 1 − λ x [ b x + . 5 x T x ] � � exp x
Posterior and full conditionals Posterior: π ( x, λ x , λ y | y ) ∝ λ a y + n 2 − 1 − λ y [ b y + . 5( y − Kx ) T ( y − Kx )] � � exp × y λ a x + m 2 − 1 − λ x [ b x + . 5 x T x ] � � exp x Full conditionals: π ( x | · · · ) ∝ exp { − 1 2[ λ y x T K T Kx − 2 λ y x T K T y + λ x x T x ] } π ( λ x | · · · ) ∝ λ a x + m 2 − 1 − λ x [ b x + . 5 x T x ] � � exp x π ( λ y | · · · ) ∝ λ a y + n 2 − 1 7 − λ y [ b y + . 5( y − Kx ) T ( y − Kx )] � � exp y Gibbs sampler implementation x | · · · ∼ N (( λ y K T K + λ x I m ) − 1 λ y K T y, ( λ y K T K + λ x I m ) − 1 ) λ x | · · · ∼ Γ ( a x + m 2 , b x + . 5 x T x ) λ y | · · · ∼ Γ ( a y + n 2 , b y + . 5( y − Kx ) T ( y − Kx ))
Gibbs sampler: intuition Gibbs sampler for a bivariate normal density − 1 − 1 � � − 1 1 ρ 2 ( z 1 z 2 ) 1 z 1 � � ρ � � π ( z ) = π ( z 1 , z 2 ) ∝ exp � � � � ρ 1 1 z 2 2 � � ρ � � � � 6 Full conditionals of π ( z ) : z 1 | z 2 ∼ N ( ρ z 2 , 1 − ρ 2 ) z 2 | z 1 ∼ N ( ρ z 1 , 1 − ρ 2 ) ( z 1 1 , z 1 2 ) = z 1 9 • initialize chain with u 6 z 2 0 1 ρ z 0 ∼ N , 0 1 ρ • draw z 1 1 ∼ N ( ρ z 0 2 , 1 − ρ 2 ) - z 0 = ( z 0 u u 1 , z 0 ( z 1 1 , z 0 2 ) 2 ) now ( z 1 1 , z 0 2 ) T ∼ π ( z ) - z 1
Gibbs sampler: intuition Gibbs sampler gives z 0 , z 2 , . . . , z T which can be treated as dependent draws from π ( z ) . If z 0 is not a draw from π ( z ) , then the initial realizations will not have the correct distribution. In practice, the first 100?, 1000? realizations are discarded. The draws can be used to make inference 6 about π ( z ) : • Posterior mean of z is estimated by: z k = 1 ˆ µ 1 T � 1 10 z k ˆ µ 2 T k =1 2 z 2 • Posterior probabilities: P ( z 1 > 1) = 1 T � k =1 I [ z k � 1 > 1] T P ( z 1 > z 2 ) = 1 T � k =1 I [ z k 1 > z k � 2 ] T • 90% interval: ( z [5%] , z [95%] ) . 1 1 - z 1
1-d example m = 6 knots evenly spaced between − . 3 and 1 . 2 . n = 5 data points at s = . 05 , . 25 , . 52 , . 65 , . 91 . k ( s ) is N (0 , sd = . 3) a y = 10 , b y = 10 · ( . 25 2 ) ⇒ strong prior at λ y = 1 /. 25 2 ; a x = 1 , b x = . 001 3 3 basis functions 2 2 1 1 y(s) k j (s) 0 0 � 1 � 1 � 2 � 2 � 3 � 3 20 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 s s 8 50 6 40 4 30 � x , � y 2 x j 20 0 10 � 2 � 4 0 0 200 400 600 800 1000 0 200 400 600 800 1000 iteration iteration
1-d example From posterior realizations of knot weights x , one can construct posterior real- � m izations of the smooth fitted function z ( s ) = j =1 k j ( s ) x j . Note strong prior on λ y required since n is small. 3 3 posterior realizations of z(s) mean & pointwise 90% bounds 2 2 1 1 z(s), y(s) z(s), y(s) 0 0 21 � 1 � 1 � 2 � 2 � 3 � 3 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 s s
1-d example m = 20 knots evenly spaced between − 2 and 12 . n = 18 data points evenly spaced between 0 and 10. k ( s ) is N (0 , sd = 2) data posterior mean and 80% CI for z(s) 1.5 1.5 1.0 1.0 y y 0.5 0.5 0.0 0.0 -0.5 -0.5 8 0 2 4 6 8 10 0 2 4 6 8 10 s s posterior dist for lam_y posterior dist for lam_x 300 200 250 200 150 150 100 100 50 50 0 0 0 10 20 30 40 50 60 0.0 0.5 1.0 1.5 2.0 lambda_y lambda_x
1-d example m = 20 knots evenly spaced between − 2 and 12 . n = 18 data points evenly spaced between 0 and 10. k ( s ) is N (0 , sd = 1) data posterior mean and 80% CI for z(s) 1.5 1.5 1.0 1.0 y y 0.5 0.5 0.0 0.0 -0.5 -0.5 9 0 2 4 6 8 10 0 2 4 6 8 10 s s posterior dist for lam_y posterior dist for lam_x 200 150 150 100 100 50 50 0 0 0 100 200 300 400 500 600 0.5 1.0 1.5 2.0 2.5 3.0 3.5 lambda_y lambda_x
Basis representations for spatial processes z ( s ) Represent z ( s ) at spatial locations s 1 , . . . , s n . z = ( z ( s 1 ) , . . . , z ( s n )) T ∼ N (0 , Σ z ) . Recall z = Kx, where KK T = Σ z and x ∼ N (0 , I n ) . Gives a discrete representation of z ( s ) at locations s 1 , . . . , s n . discrete representation continuous representation 1.0 1.0 0.5 0.5 basis basis 0.0 0.0 10 � 0.5 � 0.5 � 1.0 � 1.0 0 2 4 6 8 10 0 2 4 6 8 10 s s n n z ( s i ) = z ( s ) = j =1 k j ( s ) x j j =1 K ij x j � � Columns of K give basis functions. Can use a subset of these basis functions to reduce dimensionality.
decomposition basis covariance 10 1.0 8 Cholesky (w/piv) 0.5 location s 6 basis 0.0 4 � 0.5 2 � 1.0 0 0 2 4 6 8 10 0 2 4 6 8 10 location s s 10 1.0 8 SVD 0.5 location s 6 basis 0.0 4 � 0.5 2 11 � 1.0 0 0 2 4 6 8 10 0 2 4 6 8 10 location s s 10 1.0 8 kernels 0.5 location s 6 basis 0.0 4 � 0.5 2 � 1.0 0 0 2 4 6 8 10 0 2 4 6 8 10 s location s
How many basis kernels? Define m basis functions k 1 ( s ) , . . . , k m ( s ) . 1.0 10 0.5 8 m = 20 ? location s 6 basis 0.0 4 � 0.5 2 � 1.0 0 0 2 4 6 8 10 0 2 4 6 8 10 s location s 1.0 10 8 0.5 m = 10 ? location s 6 basis 0.0 4 12 � 0.5 2 � 1.0 0 0 2 4 6 8 10 0 2 4 6 8 10 s location s 1.0 10 8 0.5 m = 6 ? location s 6 basis 0.0 4 � 0.5 2 � 1.0 0 0 2 4 6 8 10 0 2 4 6 8 10 s location s
Moving average specifications for spatial models z ( s ) smoothing kernel k ( s ) white noise process x = ( x 1 , . . . , x m ) at spatial locations ω 1 , . . . , ω m . 13 x ∼ N (0 , 1 I m ) λ x spatial process z ( s ) constructed by convolving x with smoothing kernel k ( s ) m z ( s ) = j =1 x j k ( ω j − s ) � ⇒ z ( s ) is a Gaussian process with mean 0 and covariance given by Cov ( z ( s ) , z ( s � )) = 1 m j =1 k ( ω j − s ) k ( ω j − s � ) � λ x
14
Example: constructing 1-d models for z ( s ) 3 2 z(s), x(s) m = 20 knot locations 1 -1 ω 1 , . . . , ω m equally spaced between − 2 and 12. -3 x = ( x 1 , . . . , x m ) T ∼ N (0 , I m ) 0 2 4 6 8 10 s 3 � m z ( s ) = k =1 k ( ω k − s ) x k 2 z(s), x(s) 1 -1 k ( s ) is a normal density with sd = 1 1 2 , and 1 4 , -3 4th frame uses k ( s ) = 1 . 4 exp { − 1 . 4 | s |} . 0 2 4 6 8 10 s 3 16 2 z(s), x(s) 1 General points: -1 -3 • smooth kernels required 0 2 4 6 8 10 s • spacing depends on kernel width 3 2 − knot spacing ≤ 1 sd for normal k ( s ) z(s), x(s) 1 -1 • kernel width is equivalent to scale parameter in -3 GP models 0 2 4 6 8 10 s
17
Recommend
More recommend