Spatio-Temporal Smoothing of CO 2 Retrievals Noel Cressie Program in Spatial Statistics and Environmental Statistics Department of Statistics, The Ohio State University Matthias Katzfuß ur Angewandte Mathematik, Universit¨ Institut f¨ at Heidelberg Acknowledgment: AIST grant from NASA’s ESTO (PI: Amy Braverman) – p.1/36
AIRS CO 2 Data Global satellite measurements of CO 2 from the AIRS instrument (data below 60 ◦ S are not available) Challenges of global remote-sensing data: Massiveness Need dimension reduction Sparseness Need to take advantage of spatial and temporal correlations Nonstationarity Need a flexible model Goal : Using AIRS CO 2 data, map CO 2 globally after “de-noising” and “gap-filling”; also map uncertainty – p.2/36
Hierarchical Spatio-Temporal Model Time is discrete Write the hidden spatio-temporal process as Y t ( s ) at time t and location s The true process is observed imperfectly and incompletely Data Model : Z t ( s ) = Y t ( s ) + ε t ( s ) ε t ( · ) ∼ Gau (0 , σ 2 ε,t v ε ( · )) , is measurement error that is assumed uncorrelated across time and space σ 2 ε,t and v ε ( · ) are known Goal: For any s 0 on the globe, estimate Y t ( s 0 ); t ∈ { 1 , . . . , T } , after observing { Z t ( s i,t ) } ; define Z 1: T ≡ ( Z ′ 1 , . . . , Z ′ T ) ′ , where Z t ≡ ( Z ( s 1 ,t ) , . . . , Z ( s n t ,t )) ′ Consider a time series of spatial processes (e.g., Cressie and Wikle, 2011). Using a state space model, we smooth the data (cf. filtering ) – p.3/36
Hierarchical Spatio-Temporal Model, ctd. Process Model : Do not use a transport model. Take a secular approach and use a dynamical statistical model; let the data speak. Model the spatio-temporal variability statistically: Y t ( s ) = x t ( s ) ′ β t + S t ( s ) ′ η t + ξ t ( s ) The dimension of η t is r < < n t (dimension reduction; e.g., n t = 12515 , r = 380 ). – p.4/36
Hierarchical Spatio-Temporal Model, ctd. In vector notation , Z t = Y t + ε t Y t = X t β t + S t η t + ξ t Σ t ≡ var ( Z t ) = S t K t S ′ t + D t , where K t ≡ var ( η t ) D t ≡ σ 2 ξ,t V ξ,t + σ 2 ε,t V ε,t , is a diagonal n t × n t matrix Temporal dynamics on r -dimensional space η t = H η t − 1 + δ t U = var ( δ t ) HK t +1 H ′ + U K t = The part of the model for Y t given by S t η t + ξ t , is called a Spatio-Temporal Random Effects (STRE) model Goal: Estimate Y t from Z 1: T – p.5/36
Fixed Rank Smoothing (FRS) For the moment, assume parameters θ are known An extension of the Kalman smoother gives, for t ∈ { 1 , . . . , T } , η t | T ≡ E ( η t | Z 1: T ) , P t | T ≡ var ( η t | Z 1: T ) , P t,t − 1 | T ≡ cov ( η t , η t − 1 | Z 1: T ) ξ t | T ( s 0 ) ≡ E ( ξ t ( s 0 ) | Z 1: T ) , c t | T ( s 0 ) ≡ var ( ξ t ( s 0 ) | Z 1: T ) , R t | T ( s 0 ) ≡ cov ( η t , ξ t ( s 0 ) | Z 1: T ) Then, FRS yields (Cressie, Shi, & Kang, 2010) the estimate of Y t ( s 0 ) . For t = 1 , . . . , T : Y t ( s 0 ) = x t ( s 0 ) ′ β t + S ( s 0 ) ′ η t | T + ξ t | T ( s 0 ) , ˆ and the mean squared prediction error (MSPE), E ( ˆ Y t ( s 0 ) − Y t ( s 0 )) 2 , can also be calculated. Our measure of uncertainty is ( MSPE ) 1 / 2 Rapid computation : Need to invert the very large n t × n t covariance matrix Σ t ; only inversion of r × r matrices and n t × n t diagonal matrices are required. Use the Sherman-Morrison-Woodbury identity , over and over: ( I + PKP ′ ) − 1 = I − P ( K − 1 + P ′ P ) − 1 P ′ – p.6/36
Parameters The parameters, θ , consist of { β t } , { σ 2 ξ,t } , and the elements of K 0 , H , and U In FRS, θ was assumed known. In this presentation, we take an empirical-Bayes approach and “plug in” estimates of the components of θ . Ideally, estimate θ by maximum likelihood estimation (MLE) ; see Katzfuss and Cressie (2011a) A fully Bayes approach has also been developed (Katzfuss and Cressie, 2011b) – p.7/36
Empirical Bayes: Estimate the Parameters Define innovations , α t ≡ Z t − X t β t − S t E ( η t | Z 1: t − 1 ) ; t = 1 , . . . , T ind. ∼ Gau ( 0 , Σ α t ) , α t where Σ α t ≡ S t var ( η t | Z 1: t − 1 ) S ′ t + D t ; t = 1 , . . . , T Likelihood (Shumway & Stoffer, 2006): − 2 log L ( θ ) ≡ − 2 f ( α 1 , . . . , α T | θ ) T T X X α t ( θ ) ′ Σ α t ( θ ) − 1 α t ( θ ) = log | Σ α t ( θ ) | + t =1 t =1 Finding MLEs analytically is intractable , even for T=1 (Katzfuss and Cressie, 2009) – p.8/36
EM Algorithm: Review Suppose we observe X obs ∼ f ( x obs | θ ) , and we are interested in finding the MLE of θ If maximizing L ( θ ) ≡ f ( x obs | θ ) is hard, but there are missing data x mis such that maximizing the complete likelihood , L c ( θ ) ≡ f ( x obs , x mis | θ ) , is easy, we can use the EM algorithm to find the MLE of θ The EM Algorithm (Dempster, Laird, and Rubin, 1977): Starting with θ [0] = θ 0 , repeat the following two steps until convergence: E-Step: Calculate Q ( θ ; θ [ l ] ) ≡ E θ [ l ] { log f ( x obs , X mis | θ ) | x obs } M-Step: Calculate θ [ l +1] = arg max Q ( θ ; θ [ l ] ) θ – p.9/36
The EM Algorithm in this Context (Katzfuss & Cressie, 2011a) Reminder: Z t = X t β t + S t η t + ξ t + ε t Missing (mis) “data” : { η t : t = 0 , 1 , . . . , T } and { ξ t : t = 1 , . . . , T } − 2 log L c ( θ ) = − 2 log f ( η 1: T , Z 1: T , ξ 1: T | θ ) = log | K 0 | + tr ( K − 1 η 0 η ′ 0 ) 0 T T 1 X n t log σ 2 X tr ( V − 1 ξ,t ξ t ξ ′ + ξ,t + t ) σ 2 ξ,t t =1 t =1 T T X X tr { U − 1 ( η t − H η t − 1 )( η t − H η t − 1 ) ′ } + log | U | + t =1 t =1 T 1 tr { V − 1 X ε,t ( Z t − X t β t − S t η t − ξ t )( Z t − X t β t − S t η t − ξ t ) ′ } + σ 2 ε,t t =1 E-Step : FRS provides conditional expectations and covariances of η t and ξ t at θ [ l ] , given Z 1: T M-Step : Taking partial derivatives of Q ( θ ; θ [ l ] ) is easy due to terms being additive – p.10/36
Our EM Algorithm Choose initial value θ [0] in the parameter space Θ While || θ [ l +1] − θ [ l ] || > δ : 1. Run FRS with θ [ l ] to obtain η [ l ] t | T , P [ l ] t | T , P [ l ] t,t − 1 | T , ξ [ l ] t | T , and C [ l ] t | T ≡ diag { c t | T ( s 1 ) , . . . , c t | T ( s n ) } 2. Calculate θ [ l +1] as β [ l +1] ε,t [ Z t − S t η [ l ] t | T − ξ [ l ] t V − 1 t V − 1 = ( X ′ ε,t X t ) − 1 X ′ t | T ] t [ l +1] = tr ( V − 1 ξ,t [ C [ l ] t | T + ξ [ l ] t | T ξ [ l ] σ 2 ′ ]) /n t ξ,t t | T K [ l +1] = P [ l ] t | T + η [ l ] t | T η [ l ] ′ t t | T L [ l +1] = P [ l ] t,t − 1 | T + η [ l ] t | T η [ l ] ′ t t − 1 | T H [ l +1] = ( P T t =1 L [ l +1] t =0 K [ l +1] )( P T − 1 ) − 1 t t U [ l +1] = ( P T t =1 K [ l +1] t =1 L [ l +1] − H [ l +1] P T ′ ) / ( T − 1) t t 3. Repeat – p.11/36
Properties of the EM Estimators, ˆ θ EM The first θ [ l ] that meets the convergence criterion is called ˆ θ EM If θ [0] ∈ Θ , then θ [ l ] ∈ Θ , for all l = 1 , 2 , . . . Because Q ( θ ; θ [ l ] ) is continuous in both arguments, ˆ θ EM is generally a solution to the likelihood equations (Wu, 1983) For T=1, this algorithm is equivalent to the SRE (spatial-only) EM algorithm (Katzfuss and Cressie, 2009) used in Fixed Rank Kriging (FRK) The computational cost of the algorithm is linear in each n t – p.12/36
Summary: EM-FRS Run EM on the observed data to obtain EM parameter estimates, ˆ θ EM Do FRS with EM estimates plugged in to obtain ˆ Y t ( s 0 ) and its corresponding root mean squared prediction error at each location s 0 ; for example, EM + S ( s 0 ) ′ η EM Y t ( s 0 ) EM ≡ x t ( s 0 ) ′ ˆ ˆ t | T + ξ t | T ( s 0 ) EM β – p.13/36
AIRS CO 2 Data Measurements are taken by the Atmospheric Infrared Sounder ( AIRS ) instrument, on NASA’s Aqua satellite Mid-tropospheric CO 2 at roughly 1.30pm local time on May 1-16, 2003. Data are considered daily ( t = 1 , . . . , 16 ) Global coverage: Between − 60 ◦ and 90 ◦ latitude. (Retrievals are not available below 60 ◦ S) n 1 = 12515 , n 2 = 12959 , n 3 = 12971 , n 4 = 12495 , n 5 = 11862 , n 6 = 12317 , n 7 = 12577 , n 8 = 12527 , n 9 = 12651 , n 10 = 12252 , n 11 = 12681 , n 12 = 12076 , n 13 = 12245 , n 14 = 12447 , n 15 = 12372 , n 16 = 12532 We map our estimates onto a (hexagonal) grid of size m t = 57 , 065 for each day t The measurement unit is parts per million (ppm) – p.14/36
AIRS CO 2 Data, ctd Mid-tropospheric CO 2 on May 1-4 (for example), 2003, as measured by AIRS (gridded) – p.15/36
Assumptions and Specifications Level 2 data were moved onto a fine regular hexagonal grid of approximately the resolution of the AIRS footprint. On some occasions, more than one Level 2 datum was in a grid cell: The datum Z t ( s i ) was obtained by averaging all Level 2 data in grid cell i on day t . Variances of measurement error and fine-scale variation: σ 2 ε ≡ σ 2 ε, 1 = . . . = σ 2 ε, 16 = 5 . 6 (estimated from the variogram) σ 2 ξ ≡ σ 2 ξ, 1 = . . . = σ 2 ξ, 16 v ε,t ( s i ) is equal to the inverse of the number of Level 2 data that were averaged over the i -th (hexagonal) grid cell on day t Basis functions: r = 380 bisquare functions at three spatial resolutions, the same for all t = 1 , . . . , 16 . The core basis function is the bisquare function: b ( u ) = (1 − � u − s � 2 ) 2 I ( � u − s � ≤ 1) Trend: x ( s ) = (1 , lat ( s )) ′ Make maps on a hexagonal grid of size m t = 57 , 065 for each day t = 1 , . . . , 16 – p.16/36
Recommend
More recommend