spatial modelling
play

Spatial Modelling Patrick Brown Cancer Care Ontario 620 University - PowerPoint PPT Presentation

Spatial Modelling Patrick Brown Cancer Care Ontario 620 University Ave, 12th floor. patrick.brown@cancercare.on.ca With thanks to Paulo Ribeiro (Curitiba) and Chris Sherlock (Lancaster) for provision of course notes! Motivating


  1. Point processes ◮ Lung cancer in Ontario Cases Controls casespp controlspp ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ◮ Is there spatial variation in cancer incidence? ◮ More cases near a specific location such as a power plant? ◮ Cases tending to cluster near other cases?

  2. Geostatistics ◮ Rainfall in Parana state, Brazil ◮ Exists everywhere, but is only evaluated at a few points 600 500 400 N−S (km) 300 200 100 0 200 300 400 500 600 700 800 E−W (km)

  3. Geostatistics ◮ Intensity of lung cancer cases in Ontario ◮ Unobserved, estimated by modelling

  4. This Course ◮ Geostatistics — 6 weeks ◮ Point Processes — 3 weeks ◮ Discrete spatial variation — 1 week ◮ Markov Random fields ??? ◮ Spatio-temporal models ??

  5. Labs ◮ 1 hour in 256 McCaul, following lectures ◮ Using R and WinBUGS ◮ analyses of spatial datasets ◮ you’re encouraged to work on your own computers.

  6. Assessment ◮ One small project 20% ◮ One larger project with a presentation 40% ◮ Exam 40 %

  7. Books Main books: ◮ Diggle and Ribeiro (2006) Model-based Geostatistics amazon.com/dp/0387329072 ◮ Diggle (2003) Statistical Analysis of Spatial Point Patterns amazon.com/dp/0340740701 Other books: ◮ Moeler and Wagerpetersen “Statistical inference and simulation for spatial point processes” www.myilibrary.com/browse/open.asp?ID=19973 for a more technical treatment of point process and model fitting ◮ Rue and Held “Gaussian Markov Random Fields” amazon.com/dp/1584884320 , if we do Markov random fields. ◮ See also http://www.ai-geostats.org

  8. Contents 1. INTRODUCTION 2. BASIC GEOSTATISTICAL MODEL 3. EXPLORATORY VARIOGRAM ANALYSIS 4. PARAMETER ESTIMATION FOR GAUSSIAN MODELS 5. SPATIAL PREDICTION FOR GAUSSIAN MODELS 6. GENERALISED LINEAR GEOSTATISTICAL MODELS 7. FURTHER TOPICS, HISTORY AND PHILOSOPHY

  9. PART I: INTRODUCTION 1. Basic Examples of Spatial Data 2. A Taxonomy for Spatial Statistics 3. Further Examples of Geostatistical Problems 4. Characteristic Features of Geostatistical Problems 5. Core Geostatistical Problems

  10. Basic Examples of Spatial Data ◮ Campylobacter cases in southern England Residential locations of 651 cases of campylobacter reported over a one-year period in central southern England. • • • • • • • • •• • • • • • • • • • • • • • • 140 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • •• • • • • • • • • •• • • •• • • •• • •• • •• • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • •• • • • • •• • • •• • N-S (km) 120 • •• • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • •• • • • •• • • • • •• • • •• •• • • • • • • • • • • • • • •• • •• • • •• • •• • • • • • •• • • • • •• • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• •• • •• • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • •• • • • • • • •• • • • • • • • • 100 • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • •• • •• • • • • • • • • • • • • • • • • • • • • • • •• • •• • • •• • • •• • • • • • • • • • • ••• • • • • • • • • • • • • • • • • • • • • • ••• • • • • • • •• • • • • • •• • • • • • • • • • • • • •• • • • • • • • • • • • • • • • •• •• •• • • • • • • • • • • •• • • • • • • • • • • • • • • • • •• • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • 80 • • • • • • 380 400 420 440 460 E-W (km)

  11. Cancer rates in administrative regions Grey-scale corresponds to estimated variation in relative risk of colorectal cancer in the 36 electoral wards of the city of Birmingham, UK. 300000 0.9 1.0 1.1 1.2 1.3 295000 Northings (meters) 290000 285000 280000 275000 395000 400000 405000 410000 415000 420000 Eastings (meters)

  12. Rainfall in Paran´ a State, Brasil Rainfall measurements at 143 recording stations. Average for the May-June period (dry season). 600 500 400 N−S (km) 300 200 100 0 200 300 400 500 600 700 800 E−W (km)

  13. A Taxonomy of Spatial Statistics 1. Spatial point processes R 2 , generated Basic structure. Countable set of points x i ∈ I stochastically. e.g. cases of campylobacter. 2. Discrete spatial variation Basic structure. Y i : i = 1 , ..., n . e.g. number of cancer cases in an electoral region. ◮ rarely arises naturally ◮ but often useful as a pragmatic strategy 3. Continuous spatial variation R 2 Basic structure. Y ( x ) : x ∈ I Data ( y i , x i ) : i = 1 , ..., n e.g. rainfall measurements at locations x i . Locations may be: ◮ non-stochastic (eg lattice to cover observation region A ) ◮ or stochastic, but independent of the process Y ( x )

  14. Spatial statistics is the collection of statistical methods in which spatial locations play an explicit role in the analysis of data. Geostatistics is that part of spatial statistics concerned with data obtained by spatially discrete sampling of a spatially continuous process. Don’t confuse the data-format with the underlying process

  15. Further Examples of Geostatistical Problems Swiss rainfall data 200 ◮ Locations shown as points 150 with size proportional to the N-S (km) value of the observed 100 rainfall. 50 ◮ 467 locations in Switzerland 0 ◮ daily rainfall measurements 0 100 200 300 on 8th of May 1986 E-W (km) data from: Spatial Interpolation Comparison 97 http://www.ai − geostats.org/resources/famous geostats data.htm

  16. Calcium and magnesium contents in a soil 178 measurements of Calcium and Magnesium contents taken on the 0 − 20 cm (and 20 − 40 cm ) soil layers 5800 5600 5400 ◮ fertility maps Y Coord ◮ assess effects of soil regime 5200 and elevation 5000 ◮ joint model for Ca and Mg 4800 5000 5200 5400 5600 5800 6000 X Coord

  17. Rongelap Island 1000 • ◮ study of residual 0 • • • • • contamination, • • -1000 • following nuclear • • • • • • • • • • • • • • • • • • • • • • • • • • • • • weapons testing N-S • • • • • • • • • • • • • • • • • • • • • • • • • -2000 • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • programme during 1950’s -3000 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • ◮ island evacuated in • • • • • • • • • • -4000 1985, is it now safe for re-settlement? -5000 -6000 -5000 -4000 -3000 -2000 -1000 0 E-W ◮ survey yields noisy measurements Y i of radioactive caesium concentrations ◮ initial grid of locations x i at 200m spacing later supplemented by in-fill squares at 40m spacing.

  18. Gambia malaria ◮ survey of villages in Gambia ◮ in village i , data Y ij = 0 / 1 denotes absence/presence of malarial parasites in blood sample from child j ◮ interest in effects of covariates, and pattern of residual spatial variation o o ◮ village-level covariates: o o o o o o o o 1600 1600 1600 1600 o o o o o ◮ village locations Central ◮ public health centre in o o village? 1500 1500 1500 1500 o o o o o o o o o N-S (km) o o o o o o o o o o o o oo o o o o o o o o o o o oo o o o ◮ satellite-derived o o o o o o o o o o o o o o o o o o o o o o o o vegetation green-ness Western Eastern index 1400 1400 1400 1400 o o o o o o o o o o o o o o o o o o o o o ◮ child-level covariates: o o o o oo o o o o o o o o o o o o o o o o o o o o o o o ◮ age, sex, bed-net use 300 300 300 300 400 400 400 400 500 500 500 500 600 600 600 600 E-W (km)

  19. Characteristic Features of Geostatistical Data ◮ data consist of responses Y i := Y ( x i ) associated with locations x i ◮ in principle, Y could be determined from any location x within a continuous spatial region A ◮ it is reasonable to behave as if { Y ( x ) : x ∈ A } is a stochastic process ◮ x i is typically fixed. If the locations x i are generated by a stochastic point process, it is reasonable to behave as if this point process is independent of the Y ( x ) process ◮ scientific objectives include prediction of one or more functionals of a stochastic signal process { S ( x ) : x ∈ A } conditional on observations of the Y ( x ) process.

  20. Core Geostatistical Problems ◮ Design ◮ how many locations? ◮ how many measurements? ◮ spatial layout of the locations? ◮ what to measure at each location? ◮ Modelling ◮ probability model for the signal, [ S ] ◮ conditional probability model for the measurements, [ Y | S ] ◮ Estimation ◮ assign values to unknown model parameters ◮ make inferences about (functions of) model parameters ◮ Prediction ◮ evaluate [ T | Y ], the conditional distribution of the target given the data

  21. A basic example: elevation data 6 5 4 Y Coord 6 3 2 5 1 4 Y Coord 0 3 0 1 2 3 4 5 6 X Coord 2 1 0 6 0 1 2 3 4 5 6 5 X Coord 4 Y Coord Raw data; kriging (with raw data 3 2 overlaid); and kriging standard 1 errors. 0 0 1 2 3 4 5 6 X Coord

  22. PART II: BASIC GEOSTATISTICAL MODEL 1. Notation 2. The Signal Process 3. The Measurement Process 4. The Correlation Function 5. Model Extensions (1)

  23. Model-Based Geostatistics Basic model response = mean effect + signal + noise Notation ◮ { x i : i = 1 , ..., n } is the sampling design ◮ µ or µ i := µ ( x i ) is the trend or mean effect ◮ { Y ( x ) : x ∈ A } is the measurement process ◮ Y i := Y ( x i ) a.k.a. the response ◮ { S ( x ) : x ∈ A } is the signal process ◮ T = F ( S ) is the target for prediction ◮ [ S , Y ] = [ S ][ Y | S ] is the geostatistical model Data consist of pairs ( y i , x i ) : i = 1 , ..., n , possibly with covariates measured at each x i .

  24. The Signal Process Model the signal process S ( x ) as a Gaussian random field (GRF), also known as a Gaussian process. Initially assume it is stationary and isotropic ◮ A stationary process is one whose probability distribution is invariant under translation. ◮ An isotropic process is one whose probability distribution is invariant under rotation.

  25. Stationary, Isotropic 20 20 15 15 10 10 y y 5 5 0 0 0 5 10 15 20 0 5 10 15 20 x x

  26. Non-stationary 20 20 15 15 10 10 y y 5 5 0 0 0 5 10 15 20 0 5 10 15 20 x x

  27. Stationary, Anisotropic 1.0 1.0 0.8 0.8 0.6 0.6 Y Coord Y Coord 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 X Coord X Coord

  28. Isotropic, Non-stationary 10 5 y 0 −5 −10 −10 −5 0 5 10 x

  29. Distribution If S ( x ) is a stationary isotropic Gaussian process (SGP) then for any set of points x 1 , . . . , x n ∈ A   S ( x 1 )  ∼ N ( m 1 , σ 2 R ) .  S ( x n ) where 1 is a vector of ones and R ij = Corr [ S ( x i ) , S ( x j )] = ρ ( || x i − x j || ) for some function ρ ( · ). Without loss of generality we will always take m = 0. Clearly at any one point x S ( x ) ∼ N (0 , σ 2 )

  30. The Measurement Process 1. the conditional distribution of Y ( x i ) given S ( x i ) is Y i | s ( x i ) ∼ N ( µ + s ( x i ) , τ 2 ); 2. Y i : i = 1 , ..., n are mutually independent, conditional on S ( · ). 7.0 6.5 6.0 data 5.5 5.0 4.5 4.0 0.0 0.2 0.4 0.6 0.8 1.0 locations Simulated data in 1-D illustrating the elements of the model: data Y ( x i ) ( • ), signal S ( x ) ( ⌢ ) and mean µ (—).

  31. An Equivalent Formulation: Y i = µ + S ( x i ) + ǫ i : i = 1 , ..., n . where S ( x ) has mean 0, and ǫ i : i = 1 , ..., n are mutually independent, identically distributed with ǫ i ∼ N (0 , τ 2 ). The joint distribution of Y is multivariate Normal,   Y ( x 1 )  ∼ N ( µ 1 , σ 2 R + τ 2 I ) Y := .  Y ( x n ) where: 1 is a vector of 1’s I is the n × n identity matrix R is the n × n matrix with ( i , j ) th element ρ ( u ij ) where u ij := || x i − x j || , the Euclidean distance between x i and x j . Do exercise 1a

  32. The Correlation Function Positive definiteness ◮ The variance of some linear combination a 1 S ( x 1 ) + · · · + a n S ( x n ) is n n n n � � � � a i a j Cov [ S ( x i ) , S ( x j )] = σ 2 a i a j R ij i =1 j =1 i =1 j =1 ◮ This must be positive for all possible a i ∈ ℜ (or possibly zero). ◮ Not all candidate correlation functions posses this property.

  33. Positive Definite Matrices ◮ A is positive definite if x ′ Ax > 0 for all x . ◮ A necessary and sufficient condition for positive definiteness is for all the Eigenvalues of A to be positive. ◮ Variance matrices must be positive definite, since if Y ∼ N (0 , Σ) then for a vector a then Var [ a ′ Y ] = a ′ Σ a . ◮ For a ′ Y to have positive variance for all a , then Σ must be positive defininte.

  34. Positive Definite Functions ◮ f ( x ) is a function of x ∈ ℜ n . ◮ for any set of points x 1 . . . x m ◮ create matrix A ij = f ( x i − x j ) ◮ If A is always positive definite, then f is a positive definite function ◮ for f to be a spatial variance function it must be positive definite ◮ Otherwise we could find a set of points u 1 . . . u m and a vector b such that     Y ( u 1 ) . .  b ′ Var     .    Y ( u m ) is negative.

  35. Characteristics of positive definite functions ◮ Non-negative, and monotone decreasing. ◮ Bochner’s Theorem states that all p d functions have positive Fourier transforms. ◮ The Exponential function and the Gaussian density are positive definite. ◮ The positive definite constraint leads us to use a small number parametric families for covariance functions.

  36. Differentiability of Gaussian processes ◮ A formal mathematical description of the smoothness of a spatial surface S ( x ) is its degree of differentiability. ◮ S ( x ) is mean-square continuous if, for all x , { S ( x + h ) − S ( x ) } 2 � � E → 0 as || h || → 0 ◮ S ( x ) is mean square differentiable if there exists a process S ′ ( x ) such that, for all x , �� S ( x + h ) − S ( x ) � 2 � − S ′ ( x ) → 0 as || h || → 0 E || h || ◮ the mean-square differentiability of S ( x ) is directly linked to the differentiability of its covariance function ρ ( u ).

  37. Theorem Let S ( x ) be a SGP with correlation function ρ ( u ) : u ∈ I R . Then: ◮ S ( x ) is mean-square continuous iff ρ ( u ) is continuous at u = 0; ◮ S ( x ) is k times mean-square differentiable iff ρ ( u ) is (at least) 2 k times differentiable at u = 0.

  38. The Mat´ ern family The correlation function is given by: ρ ( u ) = { 2 κ − 1 Γ( κ ) } − 1 ( u /φ ) κ K κ ( u /φ ) ◮ κ and φ are parameters ◮ K κ ( · ) denotes modified Bessel function of order κ ◮ valid for φ > 0 and κ > 0. ◮ κ = 0 . 5: exponential correlation function ◮ κ → ∞ : Gaussian correlation function S ( x ) is mean-square m times differentiable if and only if κ > m 1.0 Three examples of the Mat´ ern 0.8 correlation function with φ = 0 . 2 correlation 0.6 0.4 and κ = 1 (solid line), κ = 1 . 5 0.2 (dashed line) and κ = 2 (dotted 0.0 line). 0.0 0.2 0.4 0.6 0.8 1.0 distance

  39. Simulating the Mat´ ern library(RandomFields) x <- y <- seq(0, 20, by=1/4) f <- GaussRF(x=x, y=y, model="whittlematern", grid=TRUE, param=c(mean=0, variance=1, nugget=0, scale=1, alpha=2)) image(x, y, f, col=topo.colors(100)) The “alpha” parameter is the roughness parameter κ in our notation.

  40. The powered exponential family ρ ( u ) = exp {− ( u /φ ) κ } ◮ defined for φ > 0 and 0 < κ ≤ 2 ◮ φ and κ are parameters ◮ mean-square continuous (but non-differentiable) if κ < 2 ◮ mean-square infinitely differentiable if κ = 2 ◮ ρ ( u ) very ill-conditioned when κ = 2 ◮ κ = 1: exponential correlation function ◮ κ = 2: Gaussian correlation function Conclusion: not as flexible as it looks 1.0 Three examples of the powered 0.8 correlation exponential correlation function 0.6 0.4 with φ = 0 . 2 and κ = 1 (solid 0.2 line), κ = 1 . 5 (dashed line) and 0.0 κ = 2 (dotted line). 0.0 0.2 0.4 0.6 0.8 1.0 distance

  41. The spherical family � 1 − 3 2 ( u /φ ) + 1 2 ( u /φ ) 3 : 0 ≤ u ≤ φ ρ ( u ; φ ) = 0 : u > φ ◮ φ > 0 is parameter ◮ finite range ◮ non-differentiable at the origin ◮ Has strange properties in the frequency domain which makes estimation unstable. ◮ Despite the problems it’s very widely used. 1.0 0.8 correlation 0.6 The spherical correlation 0.4 function with φ = 0 . 6. 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 distance

  42. Comparable Simulations (same seed) 1.5 1.0 Mat´ ern 0.5 0.0 φ = 0 . 2 and κ = 0 . 5 (—), y −0.5 κ = 1 ( - - - ) and κ = 2 (. . . ). −1.0 −1.5 0.0 0.2 0.4 0.6 0.8 1.0 x 1.5 1.0 Powered exponential 0.5 0.0 φ = 0 . 2 and κ = 1 (—), y −0.5 κ = 1 . 5 ( - - - ) and κ = 2 (. . . ). −1.0 −1.5 0.0 0.2 0.4 0.6 0.8 1.0 x 1.5 1.0 0.5 Spherical 0.0 y ( φ = 0 . 6). −0.5 −1.0 −1.5 0.0 0.2 0.4 0.6 0.8 1.0 x

  43. Model Extensions (1) Removal of trends 6 6 5 5 Re-examine the elevation data; 4 4 Y Coord Y Coord there is evidence for a linear 3 3 2 2 (quadratic?) trend with 1 1 co-ordinates. 0 0 0 1 2 3 4 5 6 700 750 800 850 900 950 In general replace constant µ X Coord data 950 with 15 900 k 850 Frequency 10 data � 800 µ i = µ ( x i ) = f ( x i ) ′ β = β j f j ( x i ) 5 750 j =1 700 0 0 1 2 3 4 5 6 650 700 750 800 850 900 950 X Coord data So that Y i = µ i + S ( x i ) + ǫ i : i = 1 , ..., n . where E [ S ( x )] = 0; S ( x ) remains stationary and isotropic.

  44. Covariates ◮ f 1 ( x ) = 1 allows for an overall mean ◮ To incorporate a linear trend in the elevation data, f 2 ( x ) and f 3 ( x ) would be the x and y coords of x . ◮ In general f j ( x i ) is any measured covariate at x i (or function of it). NB although the linear trend is only obvious for the y-coord for the elevation data, in general we would fit similar trend effects to both coordinates so as to be independent of the particular axis directions. Q: how many more parameters would be required for a quadratic trend?

  45. PART III: Exploratory Variogram Analysis 1. The Variogram 2. The Empirical Variogram 3. Monte Carlo Variogram Envelope NB: Assumption that non-spatial exploratory analysis has already been performed.

  46. The Variogram The variogram is defined by V ( x , x ′ ) := 1 � Y ( x ) − Y ( x ′ ) � 2Var Let S be an isotropic SGP with Var [ S ( x )] = σ 2 E [ S ( x )] = 0 , and correlation function ρ ( u ). Let the response be Y ( x i ) = µ i + S ( x i ) + ǫ i where ǫ i is i.i.d. Gaussian noise ǫ i ∼ N (0 , τ 2 ). Then, writing u = || x − x ′ || , the variogram of Y is V ( u ) = τ 2 + σ 2 (1 − ρ ( u )) For proof see handout.

  47. Interpreting the Variogram total sill σ 2 + τ 2 sill V ( u ) τ 2 nugget u ◮ the nugget variance : τ 2 ◮ the sill : σ 2 = Var [ S ( x )] ◮ the total sill : τ 2 + σ 2 = Var [ Y ( x )] ◮ the range : φ , such that ρ ( u ; φ ) = ρ ( u /φ ; 1)

  48. Variogram v Correlation ◮ Why not just use the correlation function? ◮ The Variogram is defined for non-stationary processes ◮ The Variogram is easier to estimate for irregular data

  49. The Empirical Variogram ◮ if Y ( x ) is stationary and isotropic, V ( x , x ′ ) = V ( u ) = 1 { Y ( x ) − Y ( x ′ ) } 2 � � 2 E ◮ suggests an empirical estimate of V ( u ): ˆ V ( u ) = average { [ y ( x i ) − y ( x j )] 2 } where each average is taken over all pairs [ y ( x i ) , y ( x j )] such that || x i − x j || ≈ u ◮ for a process with non-constant mean (covariates), trends may be removed as follows: ◮ let ˆ β be the OLS estimate µ ( x i ) = f ( x i ) ′ ˆ ◮ and ˆ β ◮ define r i := Y i − ˆ µ ( x i ) ◮ define ˆ V ( u ) = average { ( r i − r j ) 2 } , where each average is taken over all pairs ( r i , r j )

  50. The variogram cloud ◮ define the quantities r i = y i − ˆ µ ( x i ) u ij = || x i − x j || ( r i − r j ) 2 v ij = 2 ◮ the variogram cloud is a scatterplot of the points ( u ij , v ij ) Example: Swiss rainfall data ◮ under the spatial Gaussian 150000 model: ◮ V ij ∼ V ( u ij ) χ 2 100000 1 semivariance ◮ the v ij are correlated 50000 ◮ the variogram cloud is therefore unstable, both 0 pointwise and in its overall 0 50 100 150 200 distance shape

  51. The empirical variogram ◮ derived from the variogram cloud by averaging within bins : u − h / 2 ≤ u ij < u + h / 2 ◮ forms k bins, each averaging over n k pairs ◮ removes the first objection to the variogram cloud, but not the second ◮ is sensitive to mis-specification of µ ( x ) Example: Swiss rainfall data. 15000 semivariance 10000 5000 0 0 50 100 150 200 distance Empirical binned variogram Do exercise 3a (lecturer), b/c (class)

  52. Variograms of raw data and residuals can be very different Example: Paran´ a rainfall data. empirical variograms of raw data (left-hand panel) and of residuals after linear regression on latitude, longitude and altitude (right-hand panel). • • • 6000 • • 1000 • • • • • 5000 • • • • • • • ◮ variogram of raw data 800 • • • 4000 includes variation due to • • • • • • • • large-scale geographical 600 • 3000 • • trend • 400 2000 • ◮ variogram of residuals • • • • 200 eliminates this source 1000 • • • • of variation • 0 0 0 100 200 300 400 0 100 200 300 400

  53. How unstable are empirical variograms? ◮ thick solid line shows true • • • • 1.0 underlying variogram • • • • ◮ fine lines show empirical • • 0.8 • • • • • variograms from three • • Semi-variance • • 0.6 • • independent simulations of • • • • • • • the same model 0.4 • • • • ◮ high autocorrelations • 0.2 amongst ˆ • • V ( u ) for successive • values of u imparts 0.0 0.0 0.2 0.4 0.6 0.8 1.0 misleading smoothness

  54. Monte Carlo Variogram Envelope A simple test for spatial correlation. ◮ H 0 : there is no spatial correlation. ◮ Under H 0 the relative spatial positions of the data (or residuals) are irrelevant ◮ Under H 0 the data may be permuted and the resulting empirical variogram should arise from the same underlying distribution of variograms as the original.

  55. The Algorithm 2.5 Repeat k times 2.0 1. randomly permute the data semivariance 1.5 2. calculate the empirical variogram for this 1.0 permutation 0.5 For each u use the lowest and highest (or 5 th and 95 th 0.0 percentiles) of the simulated 0.0 0.2 0.4 0.6 0.8 1.0 1.2 distance V ( u )’s as envelopes (under H 0 ) Variogram and envelope for for the true value V ( u ). simulated data with µ = 0 , σ 2 = 1 , τ 2 = 0 . 25 and φ = 0 . 3.

  56. PART IV: PARAMETER ESTIMATION FOR GAUSSIAN MODELS 1. Maximum Likelihood Estimation 2. Model Extensions (2) - Box-Cox 3. A Case Study: Swiss rainfall data 4. Model Extensions (3) - Anisotropy 5. Not Covered Here 6. Bayesian estimation of parameters

  57. Maximum Likelihood Estimation The model Y ( x i ) = µ i + S ( x i ) + ǫ i ◮ mean µ i = µ ( x i ) = � k j =1 β j f j ( x i ) i.e. µ = F β ◮ SGP S ( x ) with E [ S ( x )] = 0 , Var [ S ( x )] = σ 2 and Corr [ S ( x 1 ) , S ( x 2 )] = ρ k ( || x 1 − x 2 || ; φ ) ◮ nugget effect Gaussian noise ǫ i ∼ N (0 , τ 2 ) Joint distibution F β , σ 2 R + τ 2 I � � Y ∼ N where R ij ( φ, κ ) = ρ ( || x i − x j || ).

  58. Re-parametrise ◮ Signal to noise ratio: ν 2 := τ 2 σ 2 ◮ R ∗ ( ν, φ, κ ) = R ( φ, κ ) + ν 2 I ◮ Y ∼ N � F β , σ 2 R ∗ � ◮ In this parametrisation σ will drop out of the likelihood. log-likelihood ℓ ( β , σ 2 , ν, φ, κ ) = − n 2 log 2 π − 1 1 � σ 2 R ∗ � − 2 σ 2 ( y − F β ) ′ R − 1 � �� � �� 2 log ∗ ( y − F β )

  59. Profile likelihood ◮ There is no closed-form analytical solution for the parameters which maximise the likelihood ◮ A numerical optimisation routine will have to be used ◮ If we know ν , ψ , and κ (the spatial correlation parameters) then there is an analytical solution for β and σ 2 as the model is linear. ◮ The idea : use the numerical optimiser on ν , ψ , and κ , using the analytic expressions for β and σ 2 � � β,σ 2 ,ν,φ,κ ℓ ( β , σ 2 , ν, φ, κ ) = max β,σ 2 ℓ ( β , σ 2 , ν, φ, κ ) max max ν,φ,κ By standard results of linear models (see ‘Supplementary material’), the log-likelihood ℓ ( β , σ 2 | ν 2 , φ, κ ) is maximised at � − 1 F ′ R − 1 ˆ F ′ R − 1 � β ( ν, φ, κ ) = ∗ F ∗ y 1 � ′ � � � ˆ y − F ˆ y − F ˆ R − 1 σ 2 ( ν, φ, κ ) = β β ∗ n

  60. Profile Likelihood (cont) Inserting the expressions for β and σ into the likelihood function gives the profile likelihood function − n 2 log 2 π − 1 � − n σ 2 R ∗ ℓ ∗ ( ν, φ, κ ) � �� � �� = 2 log � ˆ 2 σ 2 + || R ∗ || − 2 ℓ ∗ ( ν, φ, κ ) + c = n log ˆ where c is a constant. ν, ˆ ◮ Use a numerical optimiser (such as optim in R) to find ˆ φ, ˆ κ β and ˆ ◮ Back substitution gives ˆ σ 2 .

  61. ◮ any reasonable version of the (linear) spatial Gaussian model has at least three parameters ◮ A spatial variance parameter, for how close the process stays to the mean; ◮ Observation error variance, to take care of uncorrelated noise; and ◮ A range parameter so that changing between miles and km doesn’t affect the model ◮ You need a lot of data (or contextual knowledge) to justify estimating more than three parameters ◮ the Mat´ ern family adds a fourth, roughness, parameter. ◮ Stein (1999)’s book shows, the roughness parameter has a lot of influence on the other parameter’s estimates ◮ Smooth surface ⇒ low signal to noise ratio ◮ It is recommended to try a small number of discrete κ e.g. { 0 . 5 , 1 , 2 } ◮ The profile likelihood function for κ is usually fairly flat, and more precise estimation is usually not warranted.

  62. Model Extensions (2) - Box-Cox ◮ The Gaussian model might be inappropriate for variables with asymmetric distributions. ◮ Log transforms often Normalise positive-valued data with a heavy right tail ◮ Squaring data works with a heavy left tail. ◮ The Box-Cox transformation has a parameter λ offering a continuous range of transformations. ◮ Box-Cox is commonly used in linear regression. ◮ Terminology: Gaussian-transformed model .

  63. Box-Cox (continued) The model is defined as follows: ◮ assume a variable Y ∗ ∼ MVN ( F β , σ 2 R ∗ ) ◮ the data, denoted y = ( y 1 , ..., y n ), are generated by a transformation of the linear Gaussian model � ( y i ) λ − 1 if λ � = 0 y ∗ i = h λ ( y i ) = λ log( y i ) if λ = 0 The log-likelihood is: n 2 log σ 2 − 1 ℓ ( β , σ 2 , ν, φ, κ, λ ) = c − 2 log | R ∗ | ( h λ ( y ) − F β ) ′ { σ 2 R ∗ } − 1 ( h λ ( y ) − F β ) } + n � + ( λ − 1) log y i i =1 Here h λ ( y ) := ( h λ ( y 1 ) , . . . , h λ ( y n )) ′ .

  64. Notes: ◮ Requires all y i > 0. ◮ if some y i = 0 simply through rounding then replace with ‘imputed’ low values. ◮ if some y i = 0 because there is a probability mass at 0 then the model is strictly inappropriate. ◮ Allowing any λ ∈ ℜ and simply maximising the log-likelihood can lead to difficulties in scientific interpretation. ◮ Allow only a small set of interpretable values e.g. {− 1 , 0 , 0 . 5 , 1 } . ◮ Examine the profile log-likelihood for λ to choose the most appropriate value. ◮ Transform the data then analyse as standard case.

  65. ◮ Optimisation is CPU intensive for large datasets ◮ Most of the information for λ is in the marginal likelihood (i.e. ignoring spatial variation) ◮ Obtain a point estimate by maximising ℓ ∗ ( β , σ 2 , λ ) = c − n 2 log σ 2 n 1 � − 2 σ 2 ( h λ ( y ) − F β ) ′ ( h λ ( y ) − F β ) } + ( λ − 1) log y i i =1

  66. A Case Study: Swiss rainfall data 200 150 N-S (km) ◮ 467 locations in Switzerland 100 ◮ daily rainfall measurements on 8th of May 1986 50 ◮ The data values are integers 0 where the unit of 0 100 200 300 E-W (km) measurement is 1 / 10 mm ◮ 5 locations where the value is equal to zero. Locations of the data points with points size proportional to the value of the observed data. Distances are in kilometres.

  67. Swiss rainfall data (cont.) Profile likelihoods for λ (– · –), MLE of Box-Cox parameter λ for with 90% and 95% confidence different values of the Mat´ ern limits (- - -) roughness parameter κ . • • • • • • -2463 -2463 -2463 • • ˆ log ˆ κ λ ( κ ) L • -2464 -2464 -2464 • • • • •• • • • • • 0 . 5 0.514 -2464.246 • • • • -2465 -2465 -2465 • • • 1 0.508 -2462.413 • • • • -2466 -2466 -2466 • • 2 0.508 -2464.160 • 0.40 0.50 0.60 0.40 0.50 0.60 0.40 0.50 0.60 κ = 0 . 5, κ = 1, κ = 2. ◮ In all cases λ = 0 . 5 is within the interval but untransformed and log-transformed cases are not.

  68. Parameter estimates with λ = 0 . 5 ˆ ˆ log ˆ σ 2 τ 2 κ β ˆ φ ˆ L 0 . 5 18.36 118.82 87.97 2.48 -2464.315 1 20.13 105.06 35.79 6.92 -2462.438 2 21.36 88.58 17.73 8.72 -2464.185 Profile likelihoods with κ = 1 and λ = 0 . 5 • • • •• • ••• • -2462.5 -2462.5 -2462.5 • • • • • • • • • • • • • • • • -2463.0 -2463.0 -2463.0 • • • • • • • • • • • • -2463.5 -2463.5 -2463.5 • • • • • • • • • • -2464.0 • -2464.0 • -2464.0 • • • • • • • • 50 100 150 200 250 300 20 30 40 50 60 70 5 6 7 8 9 σ 2 τ 2 φ

  69. The log-transformation ( λ = 0) ◮ Log Gaussian data tend to have sharp peaks and large shallow troughs. ◮ On the log scale, Y ∗ ( x ) = log[ Y ( x )]. Y ∗ ( x ) = µ + S ( x ) + ǫ = µ + σ 2 Z ( x ) + ǫ ( x ) ◮ On the natural scale, Y ( x ) = e Y ∗ ( x ) = e µ ( e Z ( x ) ) σ 2 e ǫ ( x ) The larger σ 2 the sharper the peaks and softer the troughs. ◮ Remember E( Y ( x )) = exp[E( Y ∗ ( x ))] + Var [ Y ∗ ( x )] / 2.

  70. Simulations of log-Gaussian processes σ 2 = 1 σ 2 = 2. 1.0 1.0 0.8 0.8 Y Coord 0.6 Y Coord 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 X Coord X Coord

  71. Model Extensions (3) - Anisotropy ◮ Environmental conditions can induce directional effects (wind, soil formation, etc). ◮ As a consequence the spatial correlation may vary with the direction. ◮ a possible approach: geometric anisotropy . 1.0 1.0 1.0 0 . 1 0 . 1 0.3 0.5 0.5 0.5 0 . 2 0.3 0.5 0.4 5 . 0 0.6 0 0.8 . 7 0.0 0.0 0.0 0.7 7 0.6 . 0 6 0 . 0.5 0.4 0.4 0.3 −0.5 −0.5 −0.5 0 . 2 0 . 2 0.1 −1.0 −1.0 −1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 correlation contours for an isotropic model (left) and two anisotropic models (center and right).

  72. Geometric Anisotropy 1.0 31 32 33 34 35 36 0.6 0.8 25 26 27 28 29 30 31 32 33 34 35 36 0.4 25 26 27 28 29 30 0.6 19 20 21 22 23 24 ◮ two more parameters: the 19 20 21 22 23 24 x x 0.2 13 14 15 16 17 18 0.4 13 14 15 16 17 18 7 8 9 10 11 12 anisotropy angle ψ A and the 0.2 7 8 9 10 11 12 0.0 1 2 3 4 5 6 0.0 1 2 3 4 5 6 anisotropy ratio ψ R ≥ 1. −0.2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Original Space, ψ 1 = 120 ° , ψ 2 = 2 Isotropic Space ◮ transform the true co-ordinates 0.8 1.0 31 32 33 34 35 36 0.6 ( x 1 , x 2 ) to new co-ordinates ( x ′ 1 , x ′ 2 ) 0.8 25 26 27 28 29 30 6 0.4 12 18 5 24 11 4 17 0.6 19 20 21 22 23 24 0.2 30 by rotation and squashing. 23 10 36 3 29 16 9 22 x 35 x 2 28 15 8 0.0 34 21 14 1 0.4 13 14 15 16 17 18 27 7 33 20 13 26 −0.2 32 19 25 31 0.2 7 8 9 10 11 12 −0.4 � 1 � � cos( ψ A ) � x ′ � � x 1 0.0 1 2 3 4 5 6 −0.6 � 0 � − sin( ψ A ) 0.0 0.2 0.4 0.6 0.8 1.0 −1.4 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 Original Space, ψ 1 = 45 ° , ψ 2 = 2 1 = Isotropic Space 1 x ′ 0 sin( ψ A ) cos( ψ A ) x 2 1.0 31 32 33 34 35 36 1.0 2 ψ R 0.8 0.8 25 26 27 28 29 30 36 35 30 0.6 34 29 24 0.6 19 20 21 22 23 24 33 28 23 18 0.4 32 27 22 17 12 x 31 26 21 x 16 11 6 ◮ Analyse the data with respect to 25 20 15 10 5 0.4 13 14 15 16 17 18 0.2 19 14 9 4 13 8 3 7 2 0.0 1 the new co-ordinate system. 0.2 7 8 9 10 11 12 −0.2 0.0 1 2 3 4 5 6 −0.4 0.0 0.2 0.4 0.6 0.8 1.0 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6

  73. Parameter Estimation Likelihood based methods ◮ Add two parameters (angle and squashing) ◮ Increases the dimension of the numerical minimisation problem ◮ In practice a lot of data might be needed omnid. omnid. 25000 0 ° 30 ° Variogram based exploration 25000 45 ° 75 ° 90 ° 120 ° 20000 135 ° 165 ° 20000 ◮ Compute variograms for different directions 15000 15000 yy ◮ Angle bins, in particular for 10000 10000 irregularly distributed data 5000 5000 ◮ Directional variograms for the Swiss rainfall data. ⇒ 0 0 0 50 100 150 200 0 50 100 150 200

  74. Not covered here Restricted maximum likelihood (REML) ◮ transform the data Y → Y ∗ so that Y ∗ does not depend on β ◮ estimate ( σ 2 , ν, φ, κ ) by maximum likelihood on the transformed data Y ∗ ◮ leads to less biassed estimates in small samples ◮ MLE’s are sensitive to mis-specification of F (the covariate mode for µ )

  75. Also not covered Non-stationary random variation? ◮ Intrinsic variation a weaker hypothesis than stationarity (process has stationary increments, cf random walk model in time series), widely used as default model for discrete spatial variation (Besag, York and Moli´ e, 1991). ◮ Spatial deformation methods (Sampson and Guttorp, 1992) seek to achieve stationarity by transformation of the geographical space, x . ◮ as always, need to balance increased flexibility of general modelling assumptions against over-modelling of sparse data, leading to poor identifiability of model parameters.

  76. Bayesian Estimation Of Parameters As before: the model: S is an SGP with E [ S ( x )] = 0 , Var [ S ( x )] = σ 2 , and correlation function ρ ( u ; φ ). Response is Y ( x i ) = µ + S ( x i ) + ǫ i with i.i.d. Gaussian noise ǫ i ∼ N (0 , τ 2 ) (”nugget”). reparameterisation: ν 2 := τ 2 σ 2 correlation matrix : has elements R ij ( φ ) := ρ ( || x i − x j || ; φ ). Define R ∗ ( φ, ν ) := R ( φ ) + ν 2 I

  77. Bayesian stuff A judicious choice of priors yields an convenient posterior priors for φ and ν Choose a discrete prior for φ and ν π φ,ν ( φ, ν ) prior for σ 2 and µ Choose continuous priors � σ 2 � − 1 χ 2 | φ, ν ∼ n σ n σ S 2 σ m µ , σ 2 V µ � � µ | σ, φ, ν ∼ N This is known as a Gaussian-scaled-Inverse- χ 2 distribution on ( µ, σ 2 ).

  78. posterior for ( φ, ν ) : a discrete posterior with � − 1 p ( φ, ν | y ) ∝ π φ,ν ( φ, ν ) || R ∗ || − 1 / 2 V 1 / 2 2 ( n σ + n ) S 2 � ∗ ∗ (1) posterior for ( µ, σ 2 ) : A Gaussian-scaled-Inverse- χ 2 posterior distribution for µ, σ 2 | φ, ν � − 1 σ 2 � χ 2 | φ, ν, y ∼ (2) n σ + n ( n σ + n ) S 2 ∗ µ | σ 2 , φ, ν, y m ∗ , σ 2 V ∗ � � ∼ N (3) where � − 1 V − 1 + 1 ′ R − 1 � V ∗ := ∗ 1 µ m µ V − 1 + 1 ′ R − 1 V ∗ � � m ∗ := ∗ y µ and 1 S 2 m 2 µ V − 1 + n σ S 2 σ + y ′ R − 1 ∗ y − m 2 ∗ V − 1 � � ∗ := µ ∗ n σ + n see handout for proof (the proof is not examinable .)

  79. Monte Carlo Algorithm ◮ Sum the right hand side of ( ?? ) over the finite number of possible values of ( φ, ν ) and divide by this to obtain the posterior p ( φ, ν | y ) for each combination of ( φ, ν ). ◮ Simulate directly from the full posterior by the following Monte Carlo algorithm ◮ simulate φ ( i ) and ν ( i ) from p ( φ, ν | y ). ◮ simulate from the posterior for σ 2 | φ ( i ) , ν ( i ) using ( ?? ). ◮ simulate from the posterior for for µ | σ 2 ( i ) , φ ( i ) , ν ( i ) using ( ?? ). ◮ i ← i + 1; repeat.

  80. A Case Study: Swiss rainfall, 100 data Density Posterior distributions Profile Likelihoods profile log−likelihood 0.000 0.005 0.010 0.015 −563.5 −562.5 0 60 200 80 400 σ 2 σ 2 120 600 800 Density profile log−likelihood 0.00 0.05 0.10 0.15 −563.5 −562.5 0 20 15 40 φ 20 φ 60 80 25 100

  81. Joint posterior 800 600 σ 2 400 200 80 70 60 50 40 30 20 10 φ 200 150 σ 2 100 50 35 30 25 20 15 10 φ Samples and contours

  82. Extensions Add covariates : with µ = F β , but how does this affect the posterior? Vary all parameters : κ, λ, ψ R , ψ A are fixed. In principal these could be included in the analysis, with discrete priors. improper priors : certain simple improper conjugate priors for µ ( flat ) and σ 2 ( reciprocal ) are often chosen and still lead to proper posteriors (subject to reparameterisation to ν not τ ) π µ ( µ ) ∝ 1 (‘ V µ → ∞ ’) ◮ π σ ( σ 2 ) ∝ 1 /σ 2 (‘ n σ → 0’). This is the ◮ commonly used Jeffrey’s prior. ◮ but see note in section on GLSM’s.

  83. ML v Bayes Bayes ◮ Allows for for parameter uncertainty to carry over to predictions ◮ Less damage caused by inclusion of poorly identified parameters ◮ More exact parameter confidence intervals (ML is asymptotic) ◮ Can incorporate prior information ◮ Bayes is necessary for non-Gaussian responses (more on that later). ML ◮ Not affected by priors ◮ Computationally simpler

  84. PART V: SPATIAL PREDICTION FOR GAUSSIAN MODELS 1. Stochastic Process Prediction 2. Prediction under the Gaussian Model 3. What does Kriging Actually do to the Data 4. Prediction of linear Functionals 5. Plug-in Prediction 6. Model Validation and Comparison

  85. Stochastic Process Prediction General results for prediction goal: predict the realised value of a (scalar) r.v. T , using data y a realisation of a (vector) r.v. Y . a predictor of T is any function of Y , ˆ T = t ( Y ) the mean square error (MSE) of ˆ T is � T ( Y )) 2 � MSE ( ˆ ( T − ˆ T ) = E (expectation over both T and Y ) the MMSE predictor minimises the MSE

  86. Theorem The minimum mean square error predictor of T is ˆ T = E [ T | Y ] at which point � T ) 2 � ( T − ˆ = E Y [Var [ T | Y ]] E (the prediction variance is an estimate of the MSE) � See handout for proof. Also, directly from the second tower property Var [ T ] = E Y [Var [ T | Y ]] + Var Y [ E Y [ T | Y ]] � T ) 2 � ( T − ˆ Hence E ≤ Var [ T ], with equality if T and Y are independent random variables.

  87. Comments ◮ We call ˆ T the least squares predictor for T , and Var [ T | Y ] its prediction variance ◮ Var [ T ] − Var [ T | Y ] measures the contribution of the data (exploiting dependence between T and Y ) ◮ point prediction and prediction variance are summaries ◮ complete answer is the distribution [ T | Y ] ◮ not transformation invariant: ˆ T the best predictor for T does NOT necessarily imply that g ( ˆ T ) is the best predictor for g ( T ).

  88. Prediction Under The Gaussian Model ◮ assume all the parameters β , σ 2 , τ 2 , φ, κ are known ◮ assume that the target for prediction is T = S ( x ′ ) ◮ ˆ T = E [ T | Y ], Var [ T | Y ] and [ T | Y ] can be easily derived from a standard result. Under the Gaussian model Y ( x i ) = µ i + S ( x i ) + ǫ i � T �� 0 � 1 � � �� r ′ , σ 2 ∼ N R + ν 2 I Y r µ µ = F β and r is a vector with elements r i = ρ κ ( || x ′ − x i || ; φ ) : i = 1 , . . . , n Again define R ∗ = R + ν 2 I

  89. Conditional Distribution Using background results on partitioning the MVN with Z 1 = T and Z 2 = Y , we find that the minimum mean square error predictor for T = S ( x ) is σ 2 r ′ ( σ 2 R ∗ ) − 1 ( y − µ ) ˆ T = r ′ ( R ∗ ) − 1 ( y − µ ) = (4) with prediction variance σ 2 − σ 2 r ′ ( σ 2 R ∗ ) − 1 σ 2 r Var [ T | Y ] = 1 − r ′ ( R ∗ ) − 1 r σ 2 � � = (5)

  90. Exampe:Swiss rainfall data ◮ Locations shown as points 200 with size proportional to the 150 value of the observed N-S (km) rainfall. 100 ◮ 467 locations in Switzerland 50 ◮ daily rainfall measurements on 8th of May 1986 0 0 100 200 300 ◮ ˆ κ = 1, ˆ µ = 20 . 13, E-W (km) σ 2 = 105 . 06, ˆ ˆ φ = 35 . 79, τ 2 = 6 . 92 ˆ

Recommend


More recommend