Predicting missing values in spatio-temporal satellite data Reinhard Furrer, I-Math/ICS, UZH NZZ.ch ISNPS2018, Salerno, 2018/06/14
Joint work ◮ Florian Gerber ◮ Emilio Porcu and with contributions of several others 2
Arctic NDVI NDVI = NIR − R NIR + R 3
Why do we care? Scientifically: ◮ Complex interplay between climate and vegetation/ecosystems ◮ Reflectance measurements as proxy for “greenness” 4
Why do we care? Scientifically: ◮ Complex interplay between climate and vegetation/ecosystems ◮ Reflectance measurements as proxy for “greenness” Statistically: ◮ Large, spatio-temporal datasets with complex structures at low resolution ◮ . . . many missing values Imperfect “products” � 5
Basic, classical kriging Cov(pred , data) · Var(data) − 1 · data = Σ pred , data Σ − 1 data y ◮ “Simple” spatial interpolation . . . . . . on paper or in class! ◮ Flexible models for covariances are required ◮ Computational limits are quickly attained 6
Flexible models Space-time covariance functions on the sphere: ◮ Spectral representations/stochastic expansions ◮ Scale mixture representations ◮ Lagrangian framework See: Jeong, Jun, Genton, StatSci 2017 Porcu, Alegria, F, ISR 2018 7
Computational limits Reduce “problem size” See Heaton et al, arXiv:1710.05013 ◮ enforce sparsity (‘misspecification’, tapering, . . . ) ◮ Use R’s long vector support spam64 (64-bit with tailored packages for type conversions between front-end R and pre-compiled code) See Gerber, M¨ osinger, F, CaGeo 2017; . . . 8
Arctic NDVI data MODIS NDIV data (satellite product MOD13A1) 9
Kriging is smoothing 10
Kriging is smoothing 10
Interpolation using gapfill 11
Interpolation using gapfill Day of the year 145 161 177 193 2004 NDVI 0.8 2005 0.6 0.4 2006 0.2 2007 12
gapfill : ranking of the images Day of the year 161 177 193 2004 NDVI 0.8 ● 2005 0.6 Year 0.4 2006 0.2 2007 low high r = 1 2 3 4 5 6 7 8 9 10 11 12 13
gapfill : quantile regression Date: 193 doy 2004 177 doy 2006 177 doy 2005 193 doy 2006 193 doy 2005 Score: 0.65 0.71 0.77 0.88 0.91 Rank: 8 9 10 11 12 q : 0.64 0.12 0.77 ˆ NA NA 14
gapfill : prediction uncertainties Day of the year Day of the year 145 161 177 193 145 161 177 193 2004 2004 interval NDVI length 0.8 2005 2005 0.8 0.6 0.6 Year Year 0.4 0.4 2006 2006 0.2 0.2 2007 2007 data and predictions uncertainties 15
gapfill : location 16
gapfill : comparison RMSE × 10 3 17
gapfill : uncertainties (l) Uncertainty contribution from the indicated four steps of the gapfill procedure. (m) Average width of the 90% prediction intervals (40% missing values). (r) Average interval widths and coverage rate per day of the year. 18
Summary Implementation: spam64 gapfill Intuition: statistical conceptual Model: frequentist based distribution free Uncertainties: formal resampling type Practicality: play ground competitive 19
Collaboration with: – Emilio Porcu – Alfredo Alegria – Florian Gerber – Kaspar M¨ osinger – former & present ‘Applied Statistics’ team . . . and many more URPP Global Change and Biodiversity 143282, 144973, 175529 20
References (some, alphabetical) Furrer Sain (2010) spam: A sparse matrix R package with emphasis on MCMC methods for Gaussian Markov random fields JSS 36 1–25 Furrer et al (2017) spam: Sparse Matrix algebra . R package version 2.2-0 Furrer et al (2017) spam64: 64-Bit Extension of the SPArse Matrix R Package ’spam’ . R package version 2.2-0 Gerber Moesinger Furrer (2017) Extending R Packages to Support 64-bit Compiled Code: An Illustration with spam64 and GIMMS NDVI3g Data Comput Geosci 104 109-119 Gerber et al (2018) Predicting Missing Values in Spatio-Temporal Remote Sensing Data. IEEE TGRS 56(5) 2841-2853 Gerber Moesinger Furrer (2017) dotCall64: An Efficient Interface to Compiled C/C++ and Fortran Code Supporting Long Vectors arXiv :1702.08188 Heaton et al (2017) A Case Study Competition among Methods for Analyzing Large Spatial Data arXiv :1710.05013 Porcu Alegria Furrer (2018) Modeling Temporally Evolving and Spatially Globally Dependent Data International Statistical Review first published. 21
Appendix 22
Goals and concepts How to efficiently extend an R package for 64-bit use? ◮ EASY on maintainer: We do not want to re write source code! ◮ EASY on user: no loss for classical use (speed, complexity, . . . ) minimal effort for ‘expert’ use Illustrated here with use of .Fortran or .C calls (aka Foreign Function Interface). 23
Implementation dotCall64 ◮ choose 32-bit or 64-bit compiled code accordingly ◮ exploit DUP=FALSE use � dotCall64 R function dotCall64 Compiled code 32-bit compiled code Use Input Preprocess Postprocess Output 64-bit? 64-bit compiled code 24
Implementation spam64 ◮ choose 32-bit or 64-bit compiled code accordingly ◮ exploit DUP=FALSE use ◮ spam64 is ‘sed‘ ed out of spam ◮ .Fortran − → dotCall64::.C64 25
Illustration ◮ residual field from AVHRR NDVI 3g product: y = y 2000 − 2009 − y 1990 − 1999 , n = 769 , 940 observations ◮ Nonstationary Gaussian random field (zero mean) 26
Nonstationary covariance matrix � � Σ ( κ, β , τ ) = κ D ( β ) (1 − τ ) I + τ R D ( β ) ◮ κ > 0: scaling parameter ◮ D ( β ) = diag(exp( X β )): controls strength via covariates ◮ τ ∈ [0 , 1]: “no spatial correlation” vs “spatial correlation” ◮ I : identity matrix ◮ R : stationary correlation matrix: – compactly supported covariance – range 50km, sparsity 0.2% 27
Optimization ◮ Fast is relative . . . optim suboptimal Task Function Time Sparsity 0.2 o Distances 23min 1.4 Gb / h <- nearest.dist(...) oo 0.2 o Covariances 2min 1.4 Gb / T <- cov.Wend(h, ...) oo 1.9 o Cholesky 29min 8.5 Gb / oo (64-bit) chol(T, ...) ◮ optimization strategy: – (iterative) grid search over τ exploit multicore architecture – for given τ use quasi-Newton optimizer to optimize κ, β 28
Nonstationary covariance matrix ◮ with covariates “distance to nearest coast” and “elevation” ◮ diag( � Σ ): ◮ BIC improvement (1%) compared to nonstationary model . . . but not sufficiently flexible ◭ 29
Recommend
More recommend