Analysis of Areal Data: Should a Model with (Spatial) Dependence be Considered? Petrut ¸a C. Caragea Iowa State University Department of Statistics T h i n k S p a t i a l University of California at Santa Barbara Fall, 2010 P. Caragea (ISU, Statistics) S-value 1 / 26
An Example – Drumlins in Ireland Drumlins • Oval or elongated hills • Formed by the movement of glacial ice sheets across rock debris. • County Down Ireland (Hill (1973)) P. Caragea (ISU, Statistics) S-value 2 / 26
Drumlins—details • Northern Ireland Drumlin Belt • Griffith (2006): 3 regions in County Down P. Caragea (ISU, Statistics) S-value 3 / 26
Drumlins data • Overlaid 11 × 11 grids on each region (64 km 2 ) • Tabulated number of drumlins in each grid cell • Regular lattice of count data κ = 1 . 934 ˜ ˜ κ = 1 . 942 ˜ κ = 1 . 264 P. Caragea (ISU, Statistics) S-value 4 / 26
Drumlins data • Overlaid 11 × 11 grids on each region (64 km 2 ) • Tabulated number of drumlins in each grid cell • Regular lattice of count data κ = 1 . 934 ˜ ˜ κ = 1 . 942 ˜ κ = 1 . 264 Griffith (2006) fits various models to count data on regular grids P. Caragea (ISU, Statistics) S-value 4 / 26
Spatial Context • Spatial domain D , locations { s i : i = 1 , . . . , n } , random variables { Y ( s i ) : i = 1 , . . . , n } , neighborhood N i • Markov property [ Y ( s i ) |{ y ( s j ) : j � = i } ] = [ Y ( s i ) | y ( N i )]; i = 1 , . . . , n P. Caragea (ISU, Statistics) S-value 5 / 26
Spatial Context • Spatial domain D , locations { s i : i = 1 , . . . , n } , random variables { Y ( s i ) : i = 1 , . . . , n } , neighborhood N i • Markov property [ Y ( s i ) |{ y ( s j ) : j � = i } ] = [ Y ( s i ) | y ( N i )]; i = 1 , . . . , n One parameter exponential families • Conditional distribution f i ( y ( s i ) | y ( N i )) = exp [ A i ( y ( N i )) y ( s i ) − B i ( y ( N i )) + C ( y ( s i ))] • Natural parameter function A i ( y ( N i )) = τ − 1 ( κ i ) + γ 1 � { y ( s j ) − κ j } m s j ∈ N i P. Caragea (ISU, Statistics) S-value 5 / 26
Spatial Context • Spatial domain D , locations { s i : i = 1 , . . . , n } , random variables { Y ( s i ) : i = 1 , . . . , n } , neighborhood N i • Markov property [ Y ( s i ) |{ y ( s j ) : j � = i } ] = [ Y ( s i ) | y ( N i )]; i = 1 , . . . , n One parameter exponential families • Conditional distribution f i ( y ( s i ) | y ( N i )) = exp [ A i ( y ( N i )) y ( s i ) − B i ( y ( N i )) + C ( y ( s i ))] • Natural parameter function A i ( y ( N i )) = τ − 1 ( κ i ) + γ 1 � { y ( s j ) − κ j } m s j ∈ N i Winsorized Poisson: A i ( y ( N i )) = log( κ ) + γ 1 � { y ( s j ) − κ } m s j ∈ N i P. Caragea (ISU, Statistics) S-value 5 / 26
Construction—Part 1 Available Moment Estimators • For marginal expectations: n E { Y ( s i ) } = 1 � ˆ Y ( s i ) ≡ ˜ κ n i =1 P. Caragea (ISU, Statistics) S-value 6 / 26
Construction—Part 1 Available Moment Estimators • For marginal expectations: n E { Y ( s i ) } = 1 � ˆ Y ( s i ) ≡ ˜ κ n i =1 • For conditional expectations: 1 � ˆ E { Y ( s i ) | s i ∈ H ℓ } = Y ( s i ) ≡ C ℓ | H ℓ | s i ∈ H ℓ where for each ℓ = 1 , . . . , q 1 � H ℓ ≡ { s i : y ( s j ) = h ℓ } m s j ∈ N i P. Caragea (ISU, Statistics) S-value 6 / 26
Construction—Part 2 From model structure • E c = E { Y ( s i ) | s i ∈ H ℓ } = τ ( A ( h ℓ )) • E m = E { Y ( s i ) } = κ Then A i ( h ℓ ) − τ − 1 ( κ ) = γ × ( h ℓ − κ ) ⇒ τ − 1 ( E c ) − τ − 1 ( E m ) = γ × ( h ℓ − E m ) ⇒ τ − 1 ( C ℓ ) − τ − 1 (˜ κ ) ≈ γ × ( h ℓ − ˜ κ ) � �� � � �� � r ( C ℓ , ˜ κ ) D ( h ℓ , ˜ κ ) Define the S-value: � q ℓ =1 r ( C ℓ , ˜ κ ) D ( h ℓ , ˜ κ ) S = � q κ ) } 2 ℓ =1 { D ( h ℓ , ˜ Note: The S-value has the form of a crude estimator of γ . P. Caragea (ISU, Statistics) S-value 7 / 26
Interpretation Standard bound (Kaiser 2007) • | γ | < γ sb ensures κ ≈ E { Y ( s i ) } • γ sb available for exponential family models • For Winsorized Poisson: γ sb = log( R ) − log( κ ) R − κ P. Caragea (ISU, Statistics) S-value 8 / 26
Interpretation Standard bound (Kaiser 2007) • | γ | < γ sb ensures κ ≈ E { Y ( s i ) } • γ sb available for exponential family models • For Winsorized Poisson: γ sb = log( R ) − log( κ ) R − κ Uses: 1 S /γ sb is a measure of strength of dependence 2 if S >> γ sb then κ � = E { Y ( s i ) } in model • directional dependencies • non-constant mean P. Caragea (ISU, Statistics) S-value 8 / 26
Uses of the S-value: Detecting strength of dependence Winsorized Poisson with κ = 5, R = 20 and γ = 0 . 0462 ( γ sb = 0 . 0924) 30 × 30 regular lattice Case 1 ● 0.2 • ˜ κ = 4 . 921 ● ● ●● ● ● ● • S /γ sb = 0 . 853 ● 0.0 ●●●●●●● r ●● ● ● ● ● PL Estimates: ● S−value=0.0788 ● ● −0.2 ● • ˆ κ = 4 . 854 −3 −2 −1 0 1 2 3 • ˆ γ = 0 . 0827 ⇒ ˆ γ/γ sb = 0 . 895 D Case 2 0.2 • ˜ κ = 5 . 077 ●●●● • S /γ sb = 0 . 353 ● ● ●● ● ● ● 0.0 ● ● ● ● r ●● ● ●●● ● ● ● PL Estimates: ● ● S−value=0.0326 −0.2 ● • ˆ κ = 5 . 065 −3 −2 −1 0 1 2 3 • ˆ γ = 0 . 0326 ⇒ ˆ γ/γ sb = 0 . 353 D P. Caragea (ISU, Statistics) S-value 9 / 26
Uses of the S-value: Detecting directional dependence Directional Winsorized Poisson with κ = 5, R = 20 and γ 1 = 0 . 07 and γ 2 = 0 . 001 ( γ sb = 0 . 092) Unidirectional κ = 5 . 144 ˜ • S /γ sb = 0 . 947 30 ● PL Estimates: 0.2 25 ● ● 0.1 ● V−Coordinate • ˆ 20 ● κ = 5 . 071 ● ● ● ● ● ● 0.0 ● ● 15 r ● ● ● ● ● ● • ˆ 10 ● ● γ = 0 . 0899 ⇒ ˆ γ/γ sb = 0 . 973 ● ● ● S−value=0.0875 ● −0.2 5 ● ● ● 5 10 15 20 25 30 −3 −2 −1 0 1 2 3 Directional U−Coordinate D • S 1 /γ sb = 0 . 7576 ● 0.2 0.2 S 2 /γ sb = 0 . 0086 ● ● ● r (U−Direction) 0.1 r (V−Direction) 0.1 ● ● ● ● ● 0.0 0.0 ● ● ● ● ● ● PL Estimates: ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● • ˆ S−value=0.0700 S−value=0.0008 κ = 5 . 018 ● ● −0.2 −0.2 ● ● • ˆ γ 1 = 0 . 0854 ⇒ ˆ γ 1 /γ sb = 0 . 924 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 D (U−Direction) D (V−Direction) • ˆ γ 2 = 0 . 0010 ⇒ ˆ γ 2 /γ sb = 0 . 108 P. Caragea (ISU, Statistics) S-value 10 / 26
Uses of the S-value: Detecting spatial trend Data generated with trend and unidirectional dependence γ = 0 . 05 30 1.0 V−Coordinate ●●● ● ● 20 ● ● ● ● 0.0 ● ● ● Const. mean ● ● ● ● ● ● r ● ● ● ● ● ● 10 ● ●●● S /γ sb = 1 . 8 −1.0 5 S−value=0.1744 ● 5 10 20 30 −4 −2 0 2 4 U−Coordinate D 1.0 1.0 r (Median Polish) ● ● 0.5 0.5 ● ● ● ● ● r (ols) ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● 0.0 ● ● ● ● ● ● ● ● ● With trend ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.5 −0.5 S /γ sb = 0 . 47 S−value=0.0685 S−value=0.0450 −4 −2 0 2 4 −4 −2 0 2 4 D (Median Polish) D (ols) P. Caragea (ISU, Statistics) S-value 11 / 26
S-value in practice: Drumlins in Ireland • Neighborhood: 4 nearest neighbors • Winsorization value: R = 7 Calculated S-value Region Unidirectional N-S E-W NE-SW NW-SE 1 0.3681 0.3848 0.1848 0.2451 0.0688 2 0.3763 0.1574 0.3086 0.0948 0.1435 3 0.2197 0.0104 0.1961 0.0125 0.2015 Standard bound between 0.25 and 0.30. P. Caragea (ISU, Statistics) S-value 12 / 26
Drumlins in Ireland: Simulation from the model 2000 simulations using the sample mean and S-values. Region 1 Region 2 Region 3 Region 1 Region 2 Region 3 2.0 7 7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.5 5 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4 ● ● ● ● ● ● 4 ● ● ● ● ● 3 ● ● 3 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 2 ● ● ● ● ● ● ● 1 1 0.5 Model 1 Model 2 Model 3 Model 1 Model 2 Model 3 Model 1 Model 2 Model 3 • ˜ • ˜ • ˜ κ = 1 . 934 κ = 1 . 942 κ = 1 . 264 P. Caragea (ISU, Statistics) S-value 13 / 26
Recommend
More recommend