Areal Unit Data Regular Grids or Lattices Large Point-referenced Datasets Is there spatial pattern? We usually want to smooth the data. But, a tricky question. Inference for new areal units? Descriptive/algorithmic vs. Model-based Basics of Areal Data Models – p. 1
Areal unit data Basics of Areal Data Models – p. 2
Proximity matrices W , entries w ij (with w ii = 0 ). Choices for w ij : w ij = 1 if i, j share a common boundary (possibly a common vertex) w ij is an inverse distance between units w ij = 1 if distance between units is ≤ K w ij = 1 for m nearest neighbors. W is typically symmetric, but need not be W : standardize row i by w i + = � � j w ij (so matrix is now row stochastic, but probably no longer symmetric). W elements often called “weights”; interpretation Could also define first-order neighbors W (1) , second-order neighbors W (2) , etc. Basics of Areal Data Models – p. 3
Measures of spatial association Moran’s I : essentially an “areal covariogram" n � � j w ij ( Y i − ¯ Y )( Y j − ¯ Y ) i ( � i � = j w ij ) � I = i ( Y i − ¯ Y ) 2 Geary’s C : essentially an “areal variogram" ( n − 1) � � j w ij ( Y i − Y j ) 2 i ( � i � = j w ij ) � C = i ( Y i − ¯ Y ) 2 Both are asymptotically normal if Y i are i.i.d.; Moran has mean − 1 / ( n − 1) ≈ 0 , Geary has mean 1 Better significance testing by comparing to a collection of say 1000 random permutations of the Y i Basics of Areal Data Models – p. 4
Measures of spatial association (cont’d) <503 504-525 526-562 >563 Figure 1: Choropleth map of 1999 average verbal SAT scores, lower 48 U.S. states. Basics of Areal Data Models – p. 5
Measures of spatial association (cont’d) For these data, we obtain a Moran’s I of 0.5833, with associated standard error estimate 0.0920 ⇒ very strong evidence against H 0 : no spatial correlation We obtain a Geary’s C of 0.3775, with associated standard error estimate 0.1008 ⇒ again, very strong evidence against H 0 (departure from 1) Warning: These data have not been adjusted for covariates, such as the proportion of students who take the exam (Midwestern colleges have historically relied on the ACT, not the SAT; only the best and brightest students in these states would bother taking the SAT) Basics of Areal Data Models – p. 6
Correlogram (via Moran’s I ) 1.0 0.5 rho(d) 0.0 -0.5 -1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 distance ij taken from W (1) ⇒ I (1) Replace w ij with w (1) Replace w ij with w (2) ij taken from W (2) ⇒ I (2) , etc. Plot I ( r ) vs. r ; expect an initial decline across r followed by variation around 0 ⇒ spatial pattern! spatial analogue of the temporal lag autocorrelation plot Basics of Areal Data Models – p. 7
Rasterized binary data map Basics of Areal Data Models – p. 8
Binary data correlogram A version of a correlogram for a binary map, using two-way tables and log odds ratios at pixel level Note strongest pattern is to the north (N), but in no direction are the values ≈ 0 even at 40 km Basics of Areal Data Models – p. 9
Spatial smoothers � i w ij Y j To smooth Y i , replace with ˆ Y i = w i + More generally, we could include the value actually observed for unit i , and revise our smoother to (1 − α ) Y i + α ˆ Y i For 0 < α < 1 , this is a linear (convex) combination in “shrinkage" form. How to choose α ? Finally, we could try model-based smoothing, i.e., based on E ( Y i | Data ) , i.e., the mean of the predictive distribution. Smoothers then emerge as byproducts of the hierarchical spatial models we use to explain the Y i ’s Basics of Areal Data Models – p. 10
Markov random fields Consider Y = ( Y 1 , Y 2 , ..., Y n ) and the set of densities { p ( y i | y j , j � = i ) } We know p ( y 1 , y 2 , ...y n ) determines { p ( y i | y j , j � = i ) } (the set of full conditional distributions) Does { p ( y i | y j , j � = i ) } determine p ( y 1 , y 2 , ...y n ) ??? We need the notion of compatibility . With two variables, when are p ( y 1 | y 2 ) and p ( y 2 | y 1 ) compatible? Not always, e.g., p ( y 1 | y 2 ) = N ( a + by 2 , σ 2 1 ) and p ( y 2 | y 1 ) = N ( c + dy 3 1 , σ 2 2 ) Basics of Areal Data Models – p. 11
Brook’s Lemma If the full conditionals are compatible, then Brook’s Lemma provides a way to construct the joint distribution from the full conditionals We can write the joint distribution as p ( y 1 , . . . , y n ) = p ( y 1 , y 2 , . . . , y n ) p ( y 2 , y 10 , y 3 , . . . , y n ) p ( y 10 , y 2 , . . . , y n ) p ( y 20 , y 10 , y 3 , . . . , y n ) . . . p ( y n , y 10 , . . . , y n − 1 , 0 ) p ( y n 0 , y 10 , . . . , y n − 1 , 0 ) p ( y 10 , . . . , y n 0 ) Replacing each joint distributions with conditional × marginal, the marginal terms cancel and we have p ( y 1 , . . . , y n ) = p ( y 1 | y 2 , . . . , y n ) p ( y 2 | y 10 , y 3 , . . . , y n ) p ( y 10 | y 2 , . . . , y n ) p ( y 20 | y 10 , y 3 , . . . , y n ) . . . p ( y n | y 10 , . . . , y n − 1 , 0 ) p ( y n 0 | y 10 , . . . , y n − 1 , 0 ) p ( y 10 , . . . , y n 0 ) Basics of Areal Data Models – p. 12
Brook’s Lemma cont. We have the joint distribution on the left side in terms of the full conditional distributions on the right side And, if left side is proper, since it integrates to 1, the normalizing constant is determined by integrating the right side and then rescaling to 1 We have a constructive way to retrieve the joint distribution from the full conditional distributions Useful in many other problems Basics of Areal Data Models – p. 13
“Local” modeling Suppose we specify the full conditionals such that p ( y i | y j , j � = i ) = p ( y i | y j ∈ ∂ i ) , where ∂ i is the set of neighbors of cell (region) i . When does { p ( y i | y j ∈ ∂ i ) } determine p ( y 1 , y 2 , ...y n ) ? Def’n: a clique is a set of cells such that each element is a neighbor of every other element Def’n: a potential function of order k is a positive function of k arguments that is exchangeable in these arguments. Potential of order 2 is Q ( y i , y j ) with Q ( y i , y j ) = Q ( y j , y i ) Def’n: p ( y 1 , . . . , y n ) is a Gibbs distribution if, as a function of the y i , it is a product of potentials on cliques. With potentials of order 2 , p ( y 1 , . . . , y n ) = Π i<j Q ( y i , y j ) Basics of Areal Data Models – p. 14
“local” modeling, cont. For a continuous variable, with k = 2 , we might take Q ( y i , y j ) = exp ( − w i,j ( y i − y j )) 2 For binary data, k = 2 , we might take Q ( y i , y j ) = I ( y i = y j ) = y i y j + (1 − y i )(1 − y j ) Cliques of size 1 ⇔ independence Cliques of size 2 with above Q for continuous variables and w i,j = I ( i ∼ j ) ⇔ pairwise difference form � − 1 ( y i − y j ) 2 I ( i ∼ j ) p ( y 1 , y 2 , ...y n ) ∝ exp 2 τ 2 i,j and therefore p ( y i | y j , j � = i ) = N ( � j ∈ ∂ i y j /m i , τ 2 /m i ) , where m i is the number of neighbors of i No interest in k > 2 . Basics of Areal Data Models – p. 15
Two primary results Hammersley-Clifford Theorem: If we have a Markov Random Field (i.e., { p ( y i | y j ∈ ∂ i ) } uniquely determine p ( y 1 , y 2 , ...y n ) ), then the latter is a Gibbs distribution Geman and Geman result : If we have a joint Gibbs distribution, i.e., as defined above, then we have a Markov Random Field Basics of Areal Data Models – p. 16
Conditional autoregressive (CAR) model Gaussian (autonormal) case � b ij y j , τ 2 p ( y i | y j , j � = i ) = N i j Using Brook’s Lemma we can obtain � � − 1 2 y ′ D − 1 ( I − B ) y p ( y 1 , y 2 , ...y n ) ∝ exp where B = { b ij } and D is diagonal with D ii = τ 2 i . ⇒ suggests a multivariate normal distribution with µ Y = 0 and Σ Y = ( I − B ) − 1 D D − 1 ( I − B ) symmetric requires b ij i = b ji j for all i, j τ 2 τ 2 Basics of Areal Data Models – p. 17
CAR Model (cont’d) Returning to W , let b ij = w ij /w i + and τ 2 i = τ 2 /w i + , so � � − 1 2 τ 2 y ′ ( D w − W ) y p ( y 1 , y 2 , ...y n ) ∝ exp . D w diagonal with ( D w ) ii = w i + and with algebra, � − 1 w ij ( y i − y j ) 2 p ( y 1 , y 2 , ...y n ) ∝ exp 2 τ 2 i � = j Intrinsic autoregressive (IAR) model! Improper since ( D w − W ) 1 = 0 , so requires a constraint – say, � i y i = 0 So, not a data model, a random effects model! τ 2 represents both dispersion and spatial dependence Basics of Areal Data Models – p. 18
CAR Model Issues With τ 2 unknown, what to do with power of τ 2 in joint distribution? ( n − # of “islands") Basics of Areal Data Models – p. 19
CAR Model Issues With τ 2 unknown, what to do with power of τ 2 in joint distribution? ( n − # of “islands") “Proper version:” replace D w − W by D w − ρW , and choose ρ so that Σ y = ( D w − ρW ) − 1 exists! This in turn implies Y i | Y j � = i ∼ N ( ρ � j w ij Y j , τ 2 /m i ) Basics of Areal Data Models – p. 19
CAR Model Issues With τ 2 unknown, what to do with power of τ 2 in joint distribution? ( n − # of “islands") “Proper version:” replace D w − W by D w − ρW , and choose ρ so that Σ y = ( D w − ρW ) − 1 exists! This in turn implies Y i | Y j � = i ∼ N ( ρ � j w ij Y j , τ 2 /m i ) “To ρ or not to ρ ?” Basics of Areal Data Models – p. 19
Recommend
More recommend