Lecture 20 Spatial Random Effects Models + Point Reference Spatial Data Colin Rundel 04/03/2017 1
Spatial Random Effects Models 2
Scottish Lip Cancer Data 3 Observed Expected 61 ° N 60 ° N 59 ° N value 80 58 ° N 60 40 57 ° N 20 0 56 ° N 55 ° N 8 ° W 6 ° W 4 ° W 2 ° W 8 ° W 6 ° W 4 ° W 2 ° W
4 Obs/Exp % Agg Fish Forest 61 ° N 61 ° N 60 ° N 60 ° N 59 ° N 59 ° N 6 58 ° N 58 ° N 20 4 15 10 2 57 ° N 57 ° N 5 0 0 56 ° N 56 ° N 55 ° N 55 ° N 8 ° W 6 ° W 4 ° W 2 ° W 8 ° W 6 ° W 4 ° W 2 ° W
Neighborhood / weight matrix 5
Moran’s I ape:: Moran.I (lip_cancer$Observed / lip_cancer$Expected, $ p.value : num 0 ## : num 0.0784 $ sd ## $ expected: num -0.0182 ## $ observed: num 0.666 ## ## List of 4 weight = diag ( rowSums (W)) %*% W) %>% str () ## [1] 0.5323161 morans_I = function (y, w) w = diag ( rowSums (W)) %*% W) morans_I (y = lip_cancer$Observed / lip_cancer$Expected, ## [1] 0.2258758 morans_I (y = lip_cancer$Observed, w = diag ( rowSums (W)) %*% W) } (n/ sum (w)) * (num/denom) { 6 n = length (y) y_bar = mean (y) num = sum (w * (y-y_bar) %*% t (y-y_bar)) denom = sum ( (y-y_bar)^2 )
A hierachical model for lip cancer We have observed counts of lip cancer for 56 districts in Scotland. Let y i represent the number of lip cancer for district i . 7 y i ∼ Poisson ( λ i ) log ( λ i ) = log ( E i ) + x i β + ω i ω ∼ N ( 0 , σ 2 ( D − ϕ W ) − 1 ) where E i is the expected counts for each region (and serves as an offet).
Data prep & JAGS model for(i in 1:2) { ## } phi ~ dunif(0,0.99) ## tau ~ dgamma(2, 2) ## sigma2 <- 1/tau ## omega ~ dmnorm(rep(0,length(y)), tau * (D - phi*W)) ## ## } ## beta[i] ~ dnorm(0,1) ## ## D = diag ( rowSums (W)) ## y = lip_cancer$Observed ## model{ ## for(i in 1:length(y)) { ## y[i] ~ dpois(lambda[i]) ## y_pred[i] ~ dpois(lambda[i]) ## log(lambda[i]) <- log_offset[i] + X[i,] %*% beta + omega[i] ## } 8 X = model.matrix (~ scale (lip_cancer$pcaff)) log_offset = log (lip_cancer$Expected)
Model Results 9 Trace of beta[1] Density of beta[1] 2.0 0.2 1.0 −0.4 0.0 35000 40000 45000 50000 55000 60000 −0.5 0.0 0.5 Iterations N = 1000 Bandwidth = 0.04924 Trace of beta[2] Density of beta[2] 0.4 3 0.2 2 1 0.0 0 35000 40000 45000 50000 55000 60000 −0.1 0.0 0.1 0.2 0.3 0.4 0.5 Iterations N = 1000 Bandwidth = 0.02561
10 Trace of phi Density of phi 1.0 12 0.7 8 4 0.4 0 35000 40000 45000 50000 55000 60000 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Iterations N = 1000 Bandwidth = 0.01076 Trace of sigma2 Density of sigma2 1.5 1.0 0.5 0.0 35000 40000 45000 50000 55000 60000 0.5 1.0 1.5 2.0 Iterations N = 1000 Bandwidth = 0.06232
Predictions & Residuals 11 Predicted Cases Residuals 61 ° N 61 ° N 60 ° N 60 ° N 59 ° N 59 ° N 58 ° N 58 ° N 2 30 20 0 57 ° N 57 ° N 10 −2 56 ° N 56 ° N 55 ° N 55 ° N 8 ° W 6 ° W 4 ° W 2 ° W 8 ° W 6 ° W 4 ° W 2 ° W
Residuals + RMSE + Moran’s I $ observed: num 0.0642 $ p.value : num 0.305 ## : num 0.0803 $ sd ## $ expected: num -0.0182 ## ## #RMSE ## List of 4 weight = diag ( rowSums (W)) %*% W) %>% str () ape:: Moran.I (lip_cancer_pred$resid, ## [1] 0.05661104 morans_I (y = lip_cancer_pred$resid, w = diag ( rowSums (W)) %*% W) #Moran’s I ## [1] 1.498675 lip_cancer_pred$resid %>% .^2 %>% mean () %>% sqrt () 12
Point Referenced Data 13
Example - PM2.5 from CSN The Chemical Speciation Network are a series of air quality monitors run by 14 2007 (n=191) for just PM2.5. EPA (221 locations in 2007). We’ll look at a subset of the data from Nov 11th, 50 pm25 45 10 20 30 40 40 lat pm25 35 40 30 20 10 30 25 −120 −100 −80 long
csn ## 7 11011002 -86.25637 32.40712 2007-11-14 19.70000 ## 5 11130001 -84.99917 32.47639 2007-11-14 22.60000 ## 6 40139997 -112.09577 33.50383 2007-11-14 12.30000 40191028 -110.98230 32.29515 2007-11-14 -86.58637 34.68767 2007-11-14 13.40000 7.20000 ## 8 51190007 -92.28130 34.75619 2007-11-14 12.70000 ## 9 60070002 -121.84222 39.75750 2007-11-14 10.00000 ## 10 60190008 -119.77222 36.78139 2007-11-14 32.26205 ## # ... with 181 more rows ## 4 10890014 ## # A tibble: 191 × 5 <dbl> ## site longitude latitude date pm25 ## <int> <dbl> ## 3 <dttm> <dbl> ## 1 10730023 -86.81500 33.55306 2007-11-14 19.43555 ## 2 10732003 -86.92417 33.49972 2007-11-14 26.40000 15
Aside - Splines 16
We want to find a function f x that best fits our observed data y n while being as smooth as possible. x 2 dx Splines in 1d - Smoothing Splines y i in the tails) with knots at the observed data locations ( x s). by a mixture of weighted natural cubic splines (cubic splines that are linear Interestingly, this minimization problem has an exact solution which is given f 2 f x i i 1 These are a mathematical analogue to the drafting splines represented n f x arg min y 1 y using a penalized regression model. 17
Splines in 1d - Smoothing Splines n in the tails) with knots at the observed data locations ( x s). by a mixture of weighted natural cubic splines (cubic splines that are linear Interestingly, this minimization problem has an exact solution which is given These are a mathematical analogue to the drafting splines represented 17 using a penalized regression model. arg min We want to find a function f ( x ) that best fits our observed data y = y 1 , . . . , y n while being as smooth as possible. ∫ ( y i − f ( x i )) 2 + λ ∑ f ′′ ( x ) 2 dx f ( x ) i = 1
Splines in 2d - Thin Plate Splines dx dy n sum of radial basis functions with knots at the observed data locations The solution to this equation has a natural representation using a weighted 18 n arg min spline model in two dimensions, Now imagine we have observed data of the form ( x i , y i , z i ) where we wish to predict z i given x i and y i for all i . We can naturally extend the smoothing ∫ ∫ ( ∂ 2 f ∂ x 2 + 2 ∂ 2 f ∂ x ∂ y + ∂ 2 f ) ( z i − f ( x i , y i )) 2 + λ ∑ ∂ y 2 f ( x , y ) i = 1 ∑ f ( x , y ) = w i d ( x i , y i ) 2 log d ( x i , y i ) . i = 1
Fitting a TPS plot (pm25_pred) library (fields) points (coords, pch=16, cex=0.5) 19 pm25_pred = r coords = select (csn, longitude, latitude) %>% as.matrix () tps = Tps (x = coords, Y=csn$pm25) pm25_pred[cells] = predict (tps, pred_coords) 50 45 35 30 40 25 20 15 35 10 5 30 25 −120 −110 −100 −90 −80 −70
Gaussin Process Models / Kriging 20
Variogram uvec = seq (0, max (d)/2, length.out=50)) %>% plot () library (geoR) 21 variog (coords = coords, data = csn$pm25, messages = FALSE, coords = csn %>% select (latitude, longitude) %>% as.matrix () d = dist (coords) %>% as.matrix () 100 semivariance 60 20 0 0 5 10 15 20 25 distance
variog (coords = coords, data = csn$pm25, messages = FALSE, uvec = seq (0, max (d)/4, length.out=50)) %>% plot () 22 80 semivariance 60 40 20 0 0 2 4 6 8 10 12 14 distance
Isotropy / Anisotropy v4 = variog4 (coords = coords, data = csn$pm25, messages = FALSE, 23 uvec = seq (0, max (d)/4, length.out = 50)) plot (v4) 0 ° 120 45 ° 90 ° 100 135 ° semivariance 80 60 40 20 0 0 2 4 6 8 10 12 14 distance
2 exp GP Spatial Model 0 s j s i r ij n 2 0 s i w s If we assume that our data is stationary and isotropic then we can use a Gaussian s w s s y s we can also view this as a spatial random effects model where Process model to fit the data. We will assume an exponential covariance structure. 24 y ∼ N ( µ 1 , Σ) { Σ } ij = σ 2 exp ( − r ∥ s i − s j ∥ ) + σ 2 n 1 i = j
GP Spatial Model If we assume that our data is stationary and isotropic then we can use a Gaussian Process model to fit the data. We will assume an exponential covariance structure. we can also view this as a spatial random effects model where 24 y ∼ N ( µ 1 , Σ) { Σ } ij = σ 2 exp ( − r ∥ s i − s j ∥ ) + σ 2 n 1 i = j y ( s ) = µ ( s ) + w ( s ) + ϵ ( s ) w ( s ) ∼ N ( 0 , Σ ′ ) ϵ ( s i ) ∼ N ( 0 , σ 2 n ) { Σ ′ } ij = σ 2 exp ( − r ∥ s i − s j ∥ )
Fitting with spBayes library (spBayes) coords = select (csn, longitude, latitude) %>% as.matrix () starting = list (phi = 3/3, sigma.sq = 33, tau.sq = 17) beta.Norm = list (0, 1000), phi.Unif = c (3/max_range, 6), sigma.sq.IG = c (2, 2), tau.sq.IG = c (2, 2) ) 25 n = nrow (csn) n_samp = 20000 max_range = max ( dist (coords)) / 4 tuning = list (”phi”=0.1, ”sigma.sq”=0.1, ”tau.sq”=0.1) priors = list (
Recommend
More recommend