Optimal Design for Detecting Spatial Dependence D. Gumprecht, W.G. Müller and J. Rodríguez-Díaz University of Econommics Vienna, Austria Johannes-Kepler-University Linz, Austria University of Salamanca, Spain mODa 8 , Almagro, Spain, June 2007
Spatial dependence “All things are related but nearby things are more related than distant things.” (Tobler, 1970: the first law of geography) “Spatial dependency is the extent to which the value of an attribute in one location depends on the values of the attribute in nearby locations.” (Fotheringham et al, 2002). “Spatial autocorrelation (…) is the correlation among values of a single variable strictly attributable to the proximity of those values in geographic space (…).” (Griffith, 2003). “Hell is a place with no spatial dependence.” (Goodchild, 2002) 2
Random or Clustered? source: Anselin, 1988 (Columbus, 3 Ohio crime)
Spatial Randomness – values observed at a location do not depend on values observed at neighboring locations – observed spatial pattern of values is equally likely as any other spatial pattern – the location of values may be altered without affecting the information content of the data source: M.Goodchild, 2002 4
Spatial Proximity (Weight) Matrix � • Matrix W (n x n) , where � � each element w ij represents a measure of � � nearness between regions O i and O j A B C D E A 0 1 0 1 0 • Possible Choices: B 1 0 1 1 1 w ij = 1 , if O i touches O j C 0 1 0 0 1 D 1 1 0 0 1 w ij = 1 , if distance ( O i , O j ) �� d* E 0 1 1 1 0 adapted from Goodchild, 2002 5
Spatial weight matrices based on distance • Distances d ij usually measured centroid to centroid. • Most common choices are the inverse distance w ij = ( 1 - 1 1 {i=j} )/d ij , • or the negative exponential w ij = exp {- δ d ij } – 1 1 { i=j} . • Row standardization �� ij = w ij / � j w ij is employed to keep spatial parameters comparable. 6
Moran Scatter Plots We can now draw a scatter plot between a variable y , and the “spatial lag” of y , Wy . The slope of the regression line is Moran’s � , which can be interpreted as the spatial autocorrelation, the correlation between variable y and the “spatial lag” Wy � source: Anselin, 1988 7
Tests for Spatial Dependence • Moran, 1950 � � � − − n w ( y y )( y y ) ij i j � = � 2 − ( y y ) i • Cliff and Ord, 1981 for regression residuals from y = X β + ε � � ( ) 1 T T + y M W W My 2 − � T 1 T = = − , M I X X X ( ) X T y My • Anselin and Kelejian, 1997 investigate y = X β + ��� y + ε. 8
Random or Clustered? Moran’s � = 0.511 Moran’s � = -0.003 9
Distribution of Moran’s � under the H 0 : no spatial autocorrelation • Inference is usually based on a normal approximation, using a standardized z-value obtained from the mean and variance of the statistic, i.e. z( � ) = ( � -E[ � ])/ � Var [ � ] , • which are given by (see Henshaw, 1966) 2 2 − − 2{( )tr( ) tr( ) } n k K K tr( K ) � � = = Var[ | H ] E [ | H ] , 0 0 2 − − − + ( n k ) ( n k 2) n k where K = ½M( �� + �� T )M . • a saddle-point approximation and the exact distribution was derived by Tiefelsdorf, 2000. • asymptotic distributions under deviations can be found in Kelejian and Prucha, 2001. 10
Distribution of Moran’s � under the H A : spatial autocorrelation • We assume that the data is generated by a so called SAR model, i.e. y = X β + ε, where ε = ��� ε + u , u being i.i.d . • The normal approximation holds and the mean and variance are now given by (see Tiefelsdorf, 2000) ∗ − n k n k − h � ∞ � − 1 ∏ � ii | = + ⋅ λ ⋅ ⋅ ⋅ E [ H ] (1 2 t ) dt 2 A i + ⋅ λ ⋅ 1 2 t 0 = = i 1 i 1 i * are derived from functions of the where the h ii covariance matrix of the errors, and � � 2 � 2 | = | − | Var[ H ] E[ H ] E[ H ] A A A ∗ ∗ ∗ 2 ⋅ + ⋅ − − − h h 2 ( h ) n k n k n k �� ∞ � 1 − ∏ ii jj ij � 2 with | = + ⋅ λ ⋅ ⋅ ⋅ ⋅ E[ H ] (1 2 t ) t dt 2 A i + ⋅ λ ⋅ ⋅ + ⋅ λ ⋅ (1 2 t ) (1 2 t ) 0 = i = 1 j = 1 i 1 i j 11
Random or Clustered? Moran’s � = 0.511 Moran’s � = -0.003 z( � ) = 5.675 z( � ) = 0.190 12
A Design Criterion • Purpose: minimize the Type II error, i.e. the probability that, given the alternative, the Moran’s test accepts the null hypothesis of no spatial autocorrelation. � � � � − | E( H ) � � − 1 0 ≤ Φ − α min P (1 ) � � H � | A Var( H ) � � 0 • This leads us to the following design problem � � − � � � 1 Φ − α | + | − | (1 ) Var[ H ] E[ H ] E[ H ] Φ � � ∗ 0 0 A ξ = Ψ = arg min arg min � � � | ξ ∈ ξ ∈ Var[ H ] X X � � A • Of course we cannot use classical design theory since the power 1- Ψ is not convex. 13
Example: Anselin data Moran’s � = 0.511 z( � ) = 5.675 1- � = 0.799 14
Exchange type algorithms • E.g. from a given design ξ and a set of candidate points C exchange the pair which maximizes the decrease in Ψ. (Fedorov, 1972, requires evaluation of the criterion n(N-n) times at each step). • Iterate as long as there is improvement. • Variants by Wynn, 1970, Meyer & Nachtsheim, 1995, Nguyen, 2002, etc. • Simulated annealing, genetic algorithms as alternatives? 15
Example 2: Anselin data 16
Example 2: Anselin data Moran’s � = 0.511 Moran’s � = 0.417 z( � ) = 5.675 z( � ) = 1.914 1- � = 0.799 1- � = 0.983 17
References (www.ifas.jku.at) • Anselin, Luc. 1988. Spatial Econometrics: Methods and Models . Dordrecht, Amsterdam. • Cliff, Andrew. Keith Ord. 1981. Spatial Processes: Models and Applications . London: Pion. • Müller, Werner G. 2007. Collecting Spatial Data. Springer-Verlag Berlin Heidelberg • Tiefelsdorf, Michael. 2000. Modelling Spatial Processes . Springer- Verlag Berlin Heidelberg New York. 18
thank you for your attention! www.endlessforest.org 19
Is it Spatially Random? Tougher than it looks to decide! • Fact: It is observed that about twice as many people sit catty/corner rather than opposite at tables in a restaurant • Conclusion: psychological preference for nearness • In actuality: an outcome to be expected from a random process: two ways to sit opposite, but four ways to sit catty/corner source: O'Sullivan and Unwin, 20 2002
Why Spatial Autocorrelation Matters • Spatial autocorrelation is of interest in its own right because it suggests the operation of a spatial process • Additionally, most statistical analyses are based on the assumption that the values of observations in each sample are independent of one another – Positive spatial autocorrelation violates this, because samples taken from nearby areas are related to each other and are not independent • In ordinary least squares regression (OLS), for example, the correlation coefficients will be biased and their precision exaggerated – Bias implies correlation coefficients may be higher than they really are • They are biased because the areas with higher concentrations of events will have a greater impact on the model estimate – Exaggerated precision (lower standard error) implies they are more likely to be found “statistically significant” • they will overestimate precision because, since events tend to be concentrated, there are actually a fewer number of independent observations than is being assumed. source: M.Goodchild 21
Example 1: Regression on Unit Square The error covariance matrix Ω depends on the assumed parameter values ρ and δ , i.e. Ω = [(I – ��� ( δ) ) T (I – ��� ( δ) ] -1 . intercept only plane trend 22
Recommend
More recommend