imputing missing values in satellite data from parametric
play

Imputing missing values in satellite data: From parametric to - PowerPoint PPT Presentation

Imputing missing values in satellite data: From parametric to non-parametric approaches @ReinhardFurrer, I-Math/ICS, UZH NZZ.ch GI-Forum, ifgi, WWU M unster, 2018/11/13 Joint work Florian Gerber Emilio Porcu Francois Bachoc and


  1. Imputing missing values in satellite data: From parametric to non-parametric approaches @ReinhardFurrer, I-Math/ICS, UZH NZZ.ch GI-Forum, ifgi, WWU M¨ unster, 2018/11/13

  2. Joint work ◮ Florian Gerber ◮ Emilio Porcu ◮ Francois Bachoc and with contributions of several others 2

  3. Outlook ◮ Motivation ◮ Parametric models and their issues ◮ A particular non-parametric approach ◮ Outcome of a biodiversity exercise � Questions and ≪ Fast-Forward ≫ appreciated! Slides at http://www.math.uzh.ch/furrer/slides/ 3

  4. Global Change and Biodiversity www.gcb.uzh.ch ◮ University Research Priority Program, start in 2013 ◮ ≈ 50 members in 2 Faculties and 5 Institutes/Departments 4

  5. Global Change and Biodiversity www.gcb.uzh.ch 5

  6. Linking MODIS with plot based data Source: 10.1073/pnas.1703928114 “. . . we show that primary productivity, its temporal stability, and the decadal trend of a prolonged growing season strongly increase with biodiversity across heterogeneous landscapes, which is consistent over vast environmental, climatic, and altitudinal gradients. . . . ” 6

  7. Linking MODIS with plot based data Now analysis in the Arctic, not Switzerland ◮ Species abundance plot scale measurements from the International Tundra Experiment (ITEX) ◮ NDVI satellite images and ASTER elevation data Source: F. Gerber 7

  8. Arctic NDVI data MODIS NDVI data (satellite product MOD13A1, NDVI = NIR − R NIR + R ) 8

  9. Visual example Source: Gerber et al (2018), TGRS 9

  10. Visual example Source: Gerber et al (2018), TGRS 10

  11. Spatial statistics: prediction Observations: y ( s 1 ) , . . . , y ( s n ) y (s ) y (s ) 2 n y (s ) (s ) y s i 1 2 s n s 1 s i Model: Y ( s ) = signal + noise Y ( s ) = trend + stochastic part + noise Y ( s ) = x T ( s ) β + α ( s ) + Z ( s ) + ε ( s ) 12

  12. Spatial statistics: prediction Predict the quantity of interest at an arbitrary location. noise signal s 2 Why? s ◮ Fill-in missing data n s 1 s i ◮ Force data onto a regular grid ◮ Smooth out the measurement error How? ◮ By eye ◮ Linear interpolation ◮ The correct way . . . 13

  13. Spatial statistics: prediction Describing the covariance structure s 2 s n s 1 s i Covariance C ( dist ( s 1 , s 2 )) ● C ( dist ( s 1 , s n )) ● 0.0 0.2 0.4 0.6 0.8 Distance, lag h � � Covariance matrix Σ contains elements C dist( s i , s j ) . 14

  14. Spatial statistics: prediction Predict Z ( s 0 ) given y ( s 1 ) , . . . , y ( s n ). s 2 ? s n s 1 s i Minimize mean squared prediction error s 0 (over all linear unbiased predictors) � Best Linear Unbiased Predictor: � � � � − 1 obs BLUP = Cov Z ( s predict ) , Y ( s obs ) Var Y ( s obs ) Z ( s 0 ) = c T Σ − 1 y � (one spatial process, no trend, known covariance structure; otherwise almost the same) 15

  15. Outlook ◮ Motivation ◮ Parametric models and their issues ◮ A particular non-parametric approach ◮ Outcome of a biodiversity exercise 16

  16. Issues of basic, classical kriging Cov(pred , obs) · Var(obs) − 1 · obs = c Σ − 1 y ◮ “Simple” spatial interpolation . . . . . . on paper or in class! ◮ BUT: 1. Complex mean structure 2. Unknown parameters 3. Large spatial fields 4. Non-stationary covariances 5. Space-time data on the sphere 17

  17. Issues of basic, classical kriging 1. Complex mean structure 2. Unknown parameters 3. Large spatial fields 4. Non-stationary covariances 5. Space-time data on the sphere ◮ Parametric structure typically ok ◮ Non-parametric structure often creates “model clash” 18

  18. Issues of basic, classical kriging 1. Complex mean structure 2. Unknown parameters 3. Large spatial fields 4. Non-stationary covariances 5. Space-time data on the sphere ◮ (method of moment estimation) ◮ Likelihood approaches � Cholesky factorization s 19

  19. Issues of basic, classical kriging 1. Complex mean structure 2. Unknown parameters 3. Large spatial fields 4. Non-stationary covariances 5. Space-time data on the sphere ◮ Many R packages do perform kriging . . . . . . many black boxes . . . . . . to tailored situations See Heaton et al. arXiv:1710.05013/JABES forthcoming Computational limits are quickly attained! 20

  20. Methods for large spatial datasets ◮ Sparse Covariance methods: — Covariance Tapering Furrer — Spatial Partitioning Heaton ◮ Sparse Precision methods: — Lattice Kriging Nychka — Multiresolution Approximations Katzfuss — Stochastic Partial Differential Equations Lindgren — Periodic Embedding Guinness — Nearest Neighbor Processes Datta ◮ Low rank approximation: — Fixed Rank Kriging Zammit-Mangion — Predictive Processes Finley ◮ Algorithmic approaches: — Gapfill Gerber — Local Approximate Gaussian Processes Gramacy — Metakriging Guhaniyogi 21

  21. Spatial modeling Lattice model (GMRF): Geostatistical model (GRF): s 2 s n s 1 s i � E( Z i | z − i ) = β z j Covariance j neighbor of i C ( dist ( s 1 , s 2 )) Var( Z i | z − i ) = τ 2 ● C ( dist ( s 1 , s n )) ● 0.0 0.2 0.4 0.6 0.8 Gaussianity and Distance, lag h regularity conditions: Σ = τ 2 ( I − B ) − 1 Covariance matrix: Σ 22

  22. Spatial modeling Geostatistical model (GRF): Lattice model (GMRF): Σ − 1 Σ Σ app Σ 23

  23. Sparseness Using sparse covariance functions for greater computational efficiency. Sparseness is guaranteed when the covariance function has a compact support ◮ a compact support is (artificially) imposed � tapering ◮ Matern ν = 1.5 Matern ν = 1.5 Wendland Wendland Matern * Wendland 0 10 20 30 40 0 10 20 30 40 Distance, lag h Distance, lag h 25

  24. Sparseness: prediction/estimation ◮ Univariate setting: Proofs based on infill asymptotics and “misspecified” covariances Conditions on the tail behaviour of the spectrum of the (tapered) covariance Furrer, Genton, Nychka (2006) JCGS Kaufman, Schervish, Nychka (2008) JMVA Stein (2013) JCGS Bevilacqua et al (2018?) AoS ◮ Multivariate setting: Proofs based on domain increasing framework Weak conditions on the taper Furrer, Du, Bachoc (2016) JMVA 26

  25. Software Software to exploit the sparse structure spam64 for : ◮ an R package for sparse matrix algebra ◮ storage economical and fast ◮ versatile, intuitive and simple See Furrer et al. (2006) JCGS; Furrer, Sain (2010) JSS ◮ R objects have at most 2 31 elements (almost) ◮ R does not ‘have’ 64-bit integers: stored as doubles ◮ 64-bit exploitation consists of type conversions between front-end R and pre-compiled code Gerber, M¨ osinger, Furrer (2017) CaGeo Gerber, M¨ osinger, Furrer (2018) SoftwareX 27

  26. Arctic NDVI data MODIS NDIV data (satellite product MOD13A1, NDVI = NIR − R NIR + R ) 30

  27. Kriging is smoothing 31

  28. Interpolation using gapfill 32

  29. Interpolation using gapfill Day of the year 145 161 177 193 2004 NDVI 0.8 2005 0.6 0.4 2006 0.2 2007 33

  30. gapfill : ranking of the images Day of the year 161 177 193 2004 NDVI 0.8 ● 2005 0.6 Year 0.4 2006 0.2 2007 low high r = 1 2 3 4 5 6 7 8 9 10 11 12 34

  31. gapfill : quantile regression Date: 193 doy 2004 177 doy 2006 177 doy 2005 193 doy 2006 193 doy 2005 Score: 0.65 0.71 0.77 0.88 0.91 Rank: 8 9 10 11 12 q : 0.64 0.12 0.77 ˆ NA NA 35

  32. gapfill : prediction uncertainties Day of the year Day of the year 145 161 177 193 145 161 177 193 2004 2004 interval NDVI length 0.8 2005 2005 0.8 0.6 0.6 Year Year 0.4 0.4 2006 2006 0.2 0.2 2007 2007 data and predictions uncertainties 36

  33. gapfill : location 37

  34. gapfill : comparison RMSE × 10 3 38

  35. gapfill : uncertainties (l) Uncertainty contribution from the indicated four steps of the gapfill procedure. (m) Average width of the 90% prediction intervals (40% missing values). (r) Average interval widths and coverage rate per day of the year. 39

  36. Summary Implementation: spam64 gapfill Intuition: statistical conceptual Model: frequentist based distribution free Uncertainties: formal resampling type Practicality: play ground competitive 40

  37. Outlook ◮ Motivation ◮ Parametric models and their issues ◮ A particular non-parametric approach ◮ Outcome of a biodiversity exercise 41

  38. Biodiversity hypotheses H1: Plant productivity (quantified through NDVI) is positively correlated with plot scale biodiversity H2: Landscape variability (quantified through NDVI and slope) is positively correlated with plot scale biodiversity H3: Slope induces a drainage effect and increases plot scale biodiversity 42

  39. Data ◮ Species abundance plot scale measurements from the International Tundra Experiment (ITEX) � Shannon biodiversity index on site and plot scale ◮ Landsat NDVI satellite images and ASTER elevation data � characterization of the landscape heterogeneity Source: F. Gerber 43

Recommend


More recommend