Specification of Landmarks and Forecasting Water Temperature – Water Management in the River Wupper G¨ oran Kauermann Thomas Mestekemper Center for Statistics University Bielefeld University Bielefeld 14. August 2008
The River Wupper 14. August 2008 1
The River Wupper 14. August 2008 2
The River Wupper and its Power Plants 14. August 2008 3
The EU Water Framework Directive Commits European Union member states to achieve good qualitative and quantitative status of water bodies until 2015. Good surface water status means both, good ecological and chemical status. The first refers to the quality and functioning of the aquatic eco- system. For the Wupper this implies: Reduce electric power production = “Too warm upstream water” ⇒ or even shut down power plant Definition of “Warm Water” depends on the fish life and reproduction cycle and the given threshold may vary over the year. 14. August 2008 4
Outline of Talk • Forecasting (upstream) Water Temperature • Specification of Landmarks (Threshold, dependent fish spawning cycle) • Discussion 14. August 2008 5
Literature on Water Temperature Forecasting Hydrological Literature: • Seasonal and daily variations of water temperature are significantly important for aquatic resources. (Caissie et al., 2005, Hydrological Proces- ses ) • Two model classes: physical (thermo-dynamic) and stochastic (stati- stical) models. (Webb et al., 2008, Hydrological Processes ) Statistical Literature: • Functional component models or dynamic factor models. (Cornillon et al., 2008, CSDA ; Stock & Watson, 2006, Handbook of Economic Forecasting ) • Functional Time Series. (Ferraty & Vieu, 2006, Nonparametric FDA , Springer- Verlag) 14. August 2008 6
Smooth Cyclic Estimation Let index t = ( y, d ) denote time with year y , day in year d and w t and a t be a 24-dimensional vectors of the hourly water and air temperature, respectively, which decompose to = µ w ( d ) + ¯ = µ a ( d ) + ¯ w t w t , a t a yd . ↑ ↑ yearly trend yearly trend Functions µ w ( d ) and µ a ( d ) are fitted with “wrapped” B-splines, i. e. lim µ w ( d ) = lim µ w ( d ) . d → 365+ � d → 1 − � 14. August 2008 7
Average Temperatures µ w and µ a 14. August 2008 8
Functional Principal Components Decomposition w t shall be decomposed to a dynamic factor model, that is, we reduce ¯ dimensions by extracting k suitable factors (done by PCA): w t = f t Λ T ¯ w + � w,t where Λ w is a 24 × k dimensional loading matrix, f t a k dimensional factor and � w,t a white noise residual. Accordingly for the air temperature we extract h suitable factors: a t = g t Λ T ¯ a + � a,t 14. August 2008 9
Fitted Principal Components water temperature air temperature 0.3 0.2 1 0.2 0.1 2 0.1 2 0.0 factor loading factor loading 0.0 −0.1 −0.1 1 −0.2 −0.2 −0.3 5 10 15 20 5 10 15 20 hour hour 14. August 2008 10
The Dynamic Factor Model Using the backshift operator ∆ a,b f t = ( f t − a , . . . , f t − b ) . We assume an autoregressive model for the factor f t : f t = (∆ 1 ,p f t ) β f + (∆ 0 ,q g t ) β g + � f,t . This implies that f t depends on: • water temperature factors of the p previous days • air temperature factors of the q previous days • the current day air temperature factors. Note: In a forecasting setting the last point is only available as meteorological forecast. 14. August 2008 11
Estimation of the factors f t and g t We want to compare three different approaches to estimate the factors. 1) We start with a quite simple Least Squares estimation method where the facor loadins are taken as � f t = ¯ g t = ¯ w t Λ w and a t Λ a � Pro: The remaining parameters β f and β g can easily be found using least squares regression. Con: The resulting estimates are not Maximum Likelihood-based. We need to incorporate our stochastic models in the estimation method. 14. August 2008 12
Estimation of the factors f t and g t (continued) 2) In a Maximum Likelihood approach we assume that the residuals in the former mentioned models follow normal distributions: � � � � 0 , diag( σ 2 0 , diag( σ 2 w ) f ) � w,t ∼ N and � f,t ∼ N . 3) We incorporate a stochastic autoregressive model for the air tempe- rature, as well, in a Full Maximum Likelihood estimation method: q g t )˜ g t = (∆ 1 , ˜ β g + � g,t � � � � 0 , diag( σ 2 0 , diag( σ 2 asuming � a,t ∼ N a ) and � g,t ∼ N g ) . w ) and ˜ θ = ( θ, ˜ σ 2 σ 2 The unknown parameters θ = ( β f , β g , σ 2 f , σ 2 β g , ˜ g , ˜ a ) are now estimated using an EM-algorithm. 14. August 2008 13
Model Selection (in progress) In order to select the best performig model we divide our dataset in a training and a forecasting sample. To measure the model quality one could, for example, make use of the Mean Squared Prediction Error defined by: n � MSPE = 1 w t ) T . ( w t − � w t )( w t − � n t = i We have to select: • k and h ; the optimal number of factors for water and air temperature, respectively, • p and q ; the optimal number of time lags for water and air tempera- ture, respectively, (we treat ˜ q = 2 as fixed) • the optimal estimation method. 14. August 2008 14
Demonstration Warm spring days over Whitsun 2008 17 10.5.2008 11.5. 12.5. 16 13.5. 14.5. 15.5 Hour 15 14 13 5 10 15 20 Temperature 14. August 2008 15
Demonstration Real vs. Forecasted Temperature Root Mean Square Error 0.40 16 0.35 15 Temperature RMSE 0.30 14 13 0.25 real forecasted 12 0.20 0 10 20 30 40 50 60 70 5 10 15 20 Hour Hour 14. August 2008 16
Multiple Day Forecast Multiple day forecasts show discontinuities. Solution: To achieve a continuous m day forecast we divide our time axis into time intervals of length m , i. e. w m t = w ˜ t = ( w yd 1 , . . . , w yd 24 , w y ( d +1)1 , . . . w y ( d + m )24 ) The above models are re-fitted in analogy to the 24h case. 14. August 2008 17
Demonstration Real vs. forecasted Temperature Root Mean Square Error 0.6 16 0.5 15 Temperature RMSE 14 0.4 13 0.3 real forecasted 12 0.2 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 Hour Hour 14. August 2008 18
Comparison to other modelling approaches We compared our Least Squares model to three approaches to model the daily maximum temperature presented in Cassie et al. (1998, Can. J. Civ. Eng. ) ) β 1 + � 1 w max a max 1. ¯ = (∆ 0 , 2 ¯ t resulted in an RMSE of 1.295 ◦ C. t t ) β 2 + K ¯ w max w max a max 2. ¯ = (∆ 1 , 2 ¯ resulted in an RMSE of 2.439 ◦ C. t t t ζ 0 1 w max a max 3. ¯ = 1 − δ 1 B ¯ + 1 − φ 1 B n t resulted in an RMSE of 1.018 ◦ C. t t For p = 2 , q = 1 , k = h = 3 and ˜ q = 2 our Least Squares Model yielded an RMSE of 0.42 ◦ C. 14. August 2008 19
Finding Seasonal Pattern Besides forecasting is the specification of seasonal pattern an important issue, since: • Water temperature has to stay below ecologically justified thresholds to preserve the fish populations. • Threshold values depend on season, or more precisely on reproduc- tion cycle of fish. • Seasons can vary like an early spring or late summer. • What is the “reference year”? 14. August 2008 20
Literature in ’Warping’ and ’Landmark Specification’ • Landmark specification in growth curves. (Kneip & Gasser, 1992, Annals of Statistics ; Gasser & Kneip, 1995, JASA ) • Automatic Warping (or self-modelling). (Ramsay & Li, 1998, JRSS B ; Ger- vini & Gasser, 2004, JRSS B ) • We need an “online” warping, as data arrives over time. 14. August 2008 21
Structure of Water temperature temperature curves 60 2007 2006 50 2006 2005 2005 40 temperature (Celsius) 2004 2004 30 2003 2003 2002 20 2002 2001 10 0 7 8 9 10 11 12 1 2 3 4 5 6 time (months) 14. August 2008 22
Different modelling for landmark registration Let t = ( y, d, h ) where h is the hour in day d . water: w t = w ydh = ¯ + w yd x ydh ↑ ↑ daily avg. temp. residual ↓ ↓ a t = a ydh = ¯ + air: a yd z ydh A principal component analysis is run on the residuals x ydh and z ydh after substracting the mean daily temperature course: x ydh = µ x ( d ) + ¯ and z ydh = µ z ( d ) + ¯ x ydh z ydh . 14. August 2008 23
Seasonal Pattern in PCA coefficients K x � x ydh = µ x ( h ) + f yd,k λ x,k ( h ) k =1 principal components 3 0.4 0.2 2 1 0.0 1 3 −0.4 2 5 10 15 20 first score 5 0 −5 −10 2002 2003 2004 2005 2006 2007 second score 2 1 0 −2 −4 2002 2003 2004 2005 2006 2007 third score 4 2 0 −2 2002 2003 2004 2005 2006 2007 14. August 2008 24
Landmark based on First PCA Score We check, whether H 0 : E ( f yd, 1 ) ≤ 0 is rejected. first principal component scores 10 253 95 267 80 283 90 236 93 237 123 279 96 5 score 0 −5 −10 2002 2003 2004 2005 2006 2007 time p−value taking into account 15 consecutive days 253 95 267 80 283 90 236 93 237 123 279 96 0.95 p−value 0.05 2002 2003 2004 2005 2006 2007 time average daily temperature 253 95 267 80 283 90 236 93 237 123 279 96 20 temperature 15 10 5 2002 2003 2004 2005 2006 2007 time 14. August 2008 25
Recommend
More recommend