methods for small area analyses
play

Methods for Small Area Analyses of Spatial and Space-time Data Evan - PowerPoint PPT Presentation

Methods for Small Area Analyses of Spatial and Space-time Data Evan Carey Robert Penfold Elisabeth Dowling Root AcademyHealth Conference, Seattle, WA June 25, 2018 Outline Introduction Challenges of spatial data Representing


  1. Methods for Small Area Analyses of Spatial and Space-time Data Evan Carey Robert Penfold Elisabeth Dowling Root AcademyHealth Conference, Seattle, WA June 25, 2018

  2. Outline • Introduction • Challenges of spatial data • Representing space and defining spatial relationships • Spatial autocorrelation • Focus on analysis techniques for area data – Disease mapping & BYM CAR Models • Focus on analysis techniques for continuous data

  3. Part 1: Foundational Concepts • Why do I care about space: is space a parameter of interest, or a nuisance parameter? • What are different ways spatial data can be represented in my data? • How do I define ‘near’ and ‘far’? • What does autocorrelation mean? • How does spatial autocorrelation differ from spatial trends? • Why is data irregularly distributed across space challenging to model? • How is this connected to small area analysis? • What does ‘shrinkage’ mean, and why does it improve models?

  4. Why do you care about space? I am interested in the I am not interested in the relationship between effect of location, but my location and my outcome. data has spatial nature… • I want to identify areas with • Ignoring space in your high or low disease rates. models may give you biased • Potentially create maps results/incorrect p-values • Correctly modeling space showing above/below average outcomes. fixes the issue. • I want to estimate the effect • Space is a ‘ nuisance ’ of space! parameter here.

  5. Geospatial Data & Public Health Geographic data, Geographic Information Systems (GIS), and spatial analysis provide public health officials with the capability to perform two unique types of analysis: 1. Find statistically significant areas of high or low incidence 2. Examine the spatial relationship between health outcomes and population/contextual factors

  6. Geographic Variation in Health • People (demographics) and the risk factors contributing to health are dispersed unevenly across communities and regions • Often we are interested in identifying patterns of disease (or some other health outcome) across space • We are also interested in understanding the reasons for these patterns: – Composition: differences in kinds of people who live in places – Context: differences in neighborhood or area-level physical or social environments

  7. But…“spatial is special” • Data that are referenced to location bring important additional information to your data analysis • But, spatially referenced data also bring special problems to your analysis – heterogeneity of observational units → heteroskedasticity – spatial autocorrelation → residual dependence • A consequence of these “special problems” is that traditional assumptions of standard regression techniques are violated – statistical inference from such a model is not valid

  8. Spatial data is complex • The methods we chose to cope with the complexities of spatial data depend on how we define space – Discrete geographic phenomena have spatial bounds. Locations may be within or outside a geographic feature. • Areal data: census tracts, counties, states – Continuous geographic phenomena have properties continuously distributed across the landscape. Locations are specific and have value. • Point data • These definitions of space are represented by different geographic data types

  9. What are Spatial Data • Location • Attributes Attribute data: • Spatial Relationships Survey data ID Tract ChildDth Race DistPCP Spatial data: 1 1237 Yes White 5000 Object: Home longitude, latitude (x, y) 2 1237 No AA 3560 76.9147, 107.6098 3 1238 No White 10789 4 1238 No Asian 7689 Attribute data: Census tract/PCSA characteristics Tract PctPov PctAA Foreclose PCP Object: Health Center 1237 .056 .241 .011 1 1238 .079 .443 .043 3 Spatial Relationships: 1239 .151 .078 .225 10 • Proximity to physician 1240 .224 .011 .105 0 • “Contained in” census tract

  10. Spatial Data Types Event Data (Points) Lattice Data (Areas) Geostatistical Data (Grid)

  11. It’s important to understand that these designations are not mutually exclusive

  12. Points can be geolocated in some relevant areal units

  13. These aggregations can be used to produce rates 0.18 0.16 0.11 0.02 0.05 0.09 0.00 0.14 0.7

  14. GIS Spatial Data Spatial Analysis Analysis “Spatial Data Production” “Spatial Statistics” Event (Point) Lattice (Area) Geostatistical Data Data Data | | | Regional Count data Spatial Prediction Point Pattern Analysis Spatial Econometrics Spatial Epidemiology Spatial Regression Crime Analysis Analysis

  15. Thinking in one dimension: Does time effect the outcome?

  16. Thinking in one dimension: Does time effect the outcome?

  17. Thinking in one dimension: Does time effect the outcome?

  18. Thinking in one dimension: Is there a time trend?

  19. Spatial Autocorrelation and Trends (2D) “Everything is related to everything else, but near things are more related than distant things.” • Correlation in space – Is a variable in a location correlated with the values in nearby places? • Spatial trends in the outcome – The outcome differs systematically as a function of spatial location. These are distinct concepts! * Humans are pretty bad at identifying spatial trends by eye. We tend to over interpret noise when it is on a map ☺

  20. Defining spatial relationships • What is a neighbor? What’s next to what? • These spatial relationships can be defined in a number of ways – Contiguity (common boundary, K-nearest neighbors) • What is a “shared” boundary? • How many “neighbors” to include? – Distance (distance band) • What distance do we use?

  21. Contiguity based neighbors • For areas: – All polygons that share a common border • For points 1 km k=2 – Distance k=1 k=3 1.5 km K-nearest neighbors (KNN) Euclidean distance

  22. Thinking in one dimension: Does time effect the outcome?

  23. The problem with sparse data…

  24. The problem with sparse data…

  25. General Shrinkage Idea Low High If we have observed last year’s hospital mortality rate, what is your best prediction of next year’s hospital mortality rate?

  26. If we have observed last year’s hospital mortality rate, what is your best prediction of next year’s hospital mortality rate? Low High Only use information from each hospital to predict mortality. No pooling of information (no shrinkage!)

  27. If we have observed last year’s hospital mortality rate, what is your best prediction of next year’s hospital mortality rate? Low High Share (pool) information across hospitals. Prediction is ‘shrunk’ towards the mean.

  28. Sharing Spatial Data (Shrinkage) 1/45 4/20 = = 0.2 0.02 Census Tract C Census Tract B 2/25 = 2/8 = 0.08 0.25 Census Tract D Census Tract A 3/30 1/10 = 0.1 = 0.1 Census Tract F Census Tract E

  29. Focus on methods for continuously indexed data Spatial models implemented with R- INLA

  30. Motivating example: Outcomes of Veterans in Colorado Goal: Identify areas of high and low event probability. What does the ideal method need to have?

  31. Ideal method • Identify spatial trend and make predictions at all points. • Resilient to irregularly spaced data (small area analysis!) • Exhibit shrinkage / stabilization • Incorporate other patient level traits in the model (‘adjust’) • Converge in reasonable time in medium to large datasets

  32. Point pattern analysis versus point referenced models. Patient Patient Outcome = Binary + Location Demographics http://open.lib.umn.edu/mapping/chapter/6-analysis/

  33. Community care utilization in Colorado (data simulation – no PHI here!)

  34. Simulating Success of Community care Referrals in the VHA • Simulation 1: – no spatial trend (pure spatial noise) • Simulations 2-4: – Spatial trend of varying strengths. How successful are different methods at recovering the underlying spatial trends of the binomial process??

  35. Method 1: Simple Interpolation (2D Smoother) • Use a 2D smoother: – Gaussian kernel weighting – Allows smoothing of binary process at irregularly space locations. – Can compute mean and variance across space. – Nadaraya-Watson smoother (Nadaraya, 1964, 1989; Watson, 1964) • What results do you expect to get using this method?

  36. Results for data with no spatial trend.

  37. Results for data with a spatial trend (simulation 2)

  38. Results for data with a spatial trend (simulations 3 and 4)

Recommend


More recommend