Lecture 13 Gaussian Process Models - Part 2 Colin Rundel 03/01/2017 1
EDA and GPs 2
t i t j t i t j t i t j Variogram 2 Y t j Y t i E 2 can simplify to for all i and j ) then we t j t i If the process has constant mean (e.g. is called the semivariogram. where From the spatial modeling literature the typical approach is to examine an 2 t j Y t j t i Y t i E Y t j Var Y t i 2 Variogram: looking at the connection to the covariance. empirical variogram , first we’ll look at the theoretical variogram before 3
t i t j Variogram for all i and j ) then we 2 Y t j Y t i E 2 can simplify to t j From the spatial modeling literature the typical approach is to examine an t i If the process has constant mean (e.g. Variogram: looking at the connection to the covariance. empirical variogram , first we’ll look at the theoretical variogram before 3 2 γ ( t i , t j ) = Var ( Y ( t i ) − Y ( t j )) = E ([( Y ( t i ) − µ ( t i )) − ( Y ( t j ) − µ ( t j ))] 2 ) where γ ( t i , t j ) is called the semivariogram.
Variogram From the spatial modeling literature the typical approach is to examine an empirical variogram , first we’ll look at the theoretical variogram before looking at the connection to the covariance. Variogram: can simplify to 3 2 γ ( t i , t j ) = Var ( Y ( t i ) − Y ( t j )) = E ([( Y ( t i ) − µ ( t i )) − ( Y ( t j ) − µ ( t j ))] 2 ) where γ ( t i , t j ) is called the semivariogram. If the process has constant mean (e.g. µ ( t i ) = µ ( t j ) for all i and j ) then we 2 γ ( t i , t j ) = E ([ Y ( t i ) − Y ( t j )] 2 )
Some Properties of the theoretical Variogram / Semivariogram • there is no dependence if • if the process is stationary • both are non-negative • if the process is not stationary 4 • both are symmetric • both are 0 at distance 0 γ ( t i , t j ) ≥ 0 γ ( t i , t i ) = 0 γ ( t i , t j ) = γ ( t j , t i ) 2 γ ( t i , t j ) = Var ( Y ( t i )) + Var ( Y ( t j )) for all i ̸ = j 2 γ ( t i , t j ) = Var ( Y ( t i ) ) + Var ( Y ( t j ) ) − 2 Cov ( Y ( t i ) , Y ( t j ) ) 2 γ ( t i , t j ) = 2 Var ( Y ( t i ) ) − 2 Cov ( Y ( t i ) , Y ( t j ) )
Empirical Semivariogram We will assume that our process of interest is stationary, in which case we aggregate into bins and calculate the empirical semivariogram for each bin. data pairs to examine. Each individually is not very informative, so we n possible 2 n Practically, for any data set with n observations there are 5 Empirical Semivariogram: 1 will parameterize the semivariagram in terms of h = | t i − t j | . ∑ γ ( h ) = ˆ ( Y ( t i ) − Y ( t j )) 2 2 N ( h ) | t i − t j |∈ ( h − ϵ, h + ϵ )
Empirical Semivariogram We will assume that our process of interest is stationary, in which case we aggregate into bins and calculate the empirical semivariogram for each bin. data pairs to examine. Each individually is not very informative, so we 2 Practically, for any data set with n observations there are 5 Empirical Semivariogram: 1 will parameterize the semivariagram in terms of h = | t i − t j | . ∑ γ ( h ) = ˆ ( Y ( t i ) − Y ( t j )) 2 2 N ( h ) | t i − t j |∈ ( h − ϵ, h + ϵ ) ) + n possible ( n
Connection to Covariance 6
Covariance vs Semivariogram - Exponential 7 exp cov exp semivar 1.00 l 1 1.7 0.75 2.3 3 3.7 y 0.50 4.3 5 0.25 5.7 6.3 7 0.00 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 d
Covariance vs Semivariogram - Square Exponential 8 sq exp cov sq exp semivar 1.00 l 1 1.7 0.75 2.3 3 3.7 y 0.50 4.3 5 0.25 5.7 6.3 7 0.00 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 d
9 From last time 1 0 y −1 −2 0.00 0.25 0.50 0.75 1.00 t
Empirical semivariogram - no bins / cloud 10 4 gamma 2 0 0.00 0.25 0.50 0.75 1.00 h
Empirical semivariogram (binned) 11 binwidth=0.05 binwidth=0.075 4 3 2 1 0 gamma binwidth=0.1 binwidth=0.15 4 3 2 1 0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 h
Empirical semivariogram (binned + n) 12 binwidth=0.05 binwidth=0.075 4 3 2 1 n 5 0 gamma 10 binwidth=0.1 binwidth=0.15 15 4 20 25 3 2 1 0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 h
2 exp h l 2 2 exp h l 2 5 86 h 2 Theoretical vs empirical semivariogram After fitting the model last time we came up with a posterior median of Cov h h 2 1 89 1 89 exp 13 σ 2 = 1 . 89 and l = 5 . 86 for a square exponential covariance.
Theoretical vs empirical semivariogram After fitting the model last time we came up with a posterior median of 13 σ 2 = 1 . 89 and l = 5 . 86 for a square exponential covariance. Cov ( h ) = σ 2 exp ( − ( h l ) 2 ) γ ( h ) = σ 2 − σ 2 exp ( − ( h l ) 2 ) = 1 . 89 − 1 . 89 exp ( − ( 5 . 86 h ) 2 )
Theoretical vs empirical semivariogram After fitting the model last time we came up with a posterior median of 13 σ 2 = 1 . 89 and l = 5 . 86 for a square exponential covariance. Cov ( h ) = σ 2 exp ( − ( h l ) 2 ) γ ( h ) = σ 2 − σ 2 exp ( − ( h l ) 2 ) = 1 . 89 − 1 . 89 exp ( − ( 5 . 86 h ) 2 ) binwidth=0.05 binwidth=0.1 3 gamma 2 1 0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 h
Variogram features 14
PM2.5 Example 15
FRN Data Measured PM2.5 data from an EPA monitoring station in Columbia, NJ. 16 20 15 pm25 10 5 Jan 2007 Apr 2007 Jul 2007 Oct 2007 Jan 2008 date
FRN Data -68.016 13.3 -68.016 46.682 230031011 57 2007-02-26 14.1 -68.016 46.682 230031011 54 2007-02-23 14.7 46.682 60 230031011 48 2007-02-17 6.5 -68.016 46.682 230031011 45 2007-02-14 11.5 -68.016 46.682 230031011 2007-03-01 230031011 2007-02-11 2007-03-10 75 2007-03-16 10.3 -68.016 46.682 230031011 72 2007-03-13 8.6 -68.016 46.682 230031011 69 14.0 46.682 -68.016 46.682 230031011 66 2007-03-07 9.0 -68.016 46.682 230031011 63 2007-03-04 8.6 -68.016 42 19.9 site 10.4 2007-01-18 7.5 -68.016 46.682 230031011 15 2007-01-15 9.7 -68.016 46.682 230031011 6 2007-01-06 -68.016 230031011 46.682 230031011 3 2007-01-03 8.9 -68.016 46.682 230031011 day date pm25 longitude latitude 18 46.682 -68.016 230031011 46.682 230031011 36 2007-02-05 9.1 -68.016 46.682 230031011 30 2007-01-30 16.2 -68.016 46.682 27 -68.016 2007-01-27 9.0 -68.016 46.682 230031011 24 2007-01-24 9.5 -68.016 46.682 230031011 21 2007-01-21 4.6 17
Mean Model ## Coefficients: -0.0724639 0.0001751 ## ## Call: ## lm(formula = pm25 ~ day + I(day^2), data = pm25) ## ## (Intercept) ## day I(day^2) ## 12.9644351 -0.0724639 0.0001751 12.9644351 I(day^2) 18 ## (Intercept) ## Coefficients: ## ## lm(formula = pm25 ~ day + I(day^2), data = pm25) ## Call: ## day 20 15 pm25 10 5 0 100 200 300 day
19 Detrended Residuals Residuals 10 5 resid 0 −5 0 100 200 300 day
Empirical Variogram 20 binwidth=3 binwidth=6 40 n 30 200 gamma 150 100 20 50 10 0 0 100 200 300 0 100 200 300 h
Empirical Variogram 21 binwidth=6 binwidth=9 15 10 gamma 5 0 0 50 100 150 0 50 100 150 h
1 d 2 d 2 w 2 0 w 0 w d Model What does the model we are trying to fit actually look like? 0 d where w w d d y d 22
Model What does the model we are trying to fit actually look like? where 22 y ( d ) = µ ( d ) + w ( d ) + w µ ( d ) = β 0 + β 1 d + β 2 d 2 w ( d ) ∼ GP ( 0 , Σ) w ∼ N ( 0 , σ 2 w )
JAGS Model ## ## Sigma[k,k] <- sigma2 + sigma2_w ## } ## ## for (i in 1:3) { ## beta[i] ~ dt(0, 2.5, 1) } ## ## sigma2_w ~ dnorm(10, 1/25) T(0,) ## sigma2 ~ dnorm(10, 1/25) T(0,) ## l ~ dt(0, 2.5, 1) T(0,) ## } for (k in 1:N) { ## ## model{ ## ## y ~ dmnorm(mu, inverse(Sigma)) ## ## for (i in 1:N) { ## mu[i] <- beta[1]+ beta[2] * x[i] + beta[3] * x[i]^2 ## } ## } for (i in 1:(N-1)) { ## for (j in (i+1):N) { ## Sigma[i,j] <- sigma2 * exp(- pow(l*d[i,j],2)) ## Sigma[j,i] <- Sigma[i,j] ## } ## 23
Posterior - Betas 24 Trace of beta[1] Density of beta[1] 0.08 10 0.00 0 15000 20000 25000 30000 35000 40000 −5 0 5 10 15 20 Iterations N = 715 Bandwidth = 1.543 Trace of beta[2] Density of beta[2] 0.15 8 4 −0.15 0 15000 20000 25000 30000 35000 40000 −0.2 −0.1 0.0 0.1 0.2 Iterations N = 715 Bandwidth = 0.01645 Trace of beta[3] Density of beta[3] 2500 −4e−04 0 15000 20000 25000 30000 35000 40000 −4e−04 −2e−04 0e+00 2e−04 4e−04 Iterations N = 715 Bandwidth = 3.873e−05
Posterior - Covariance Parameters 25 Trace of l Density of l 1.0 10 0.0 0 15000 20000 25000 30000 35000 40000 0.0 0.5 1.0 1.5 Iterations N = 715 Bandwidth = 0.01888 Trace of sigma2 Density of sigma2 0.05 15 0.00 0 15000 20000 25000 30000 35000 40000 0 5 10 15 20 25 30 Iterations N = 715 Bandwidth = 1.471 Trace of sigma2_w Density of sigma2_w 0.20 15 5 0.00 15000 20000 25000 30000 35000 40000 0 5 10 15 Iterations N = 715 Bandwidth = 0.5303
Recommend
More recommend