smoothing applications for irregular time series with
play

Smoothing Applications for Irregular Time Series with Measurement - PowerPoint PPT Presentation

Smoothing Applications for Irregular Time Series with Measurement Errors Jonathan Rathjens, Eva Becker, Arthur Kolbe, Katharina Olthoff, Michael Wilhelm, Katja Ickstadt, and Jrgen Hlzer 2 December 2016 Introduction 1 Data 2 Model


  1. Smoothing Applications for Irregular Time Series with Measurement Errors Jonathan Rathjens, Eva Becker, Arthur Kolbe, Katharina Olthoff, Michael Wilhelm, Katja Ickstadt, and Jürgen Hölzer 2 December 2016

  2. Introduction 1 Data 2 Model Development 3 Regression Kernel Smoothing Categorization Results 4 Regression Kernel Smoothing Conclusions and Outlook 5 Introduction Rathjens et al. , 2 December 2016 1/30

  3. Epidemiological Background PFASs p er- and poly f luoro a lkyl s ubstance s lead substances: perfluorooctanoic acid (PFOA) perfluorooctane sulphonic acid (PFOS) ubiquitous in industrial and household products persistent; accumulate in organisms ◮ internal exposure of general population Importance of Drinking Water food as most important source of human exposure to PFASs contaminated drinking water predominant ◮ surrogate marker for internal exposure Introduction Rathjens et al. , 2 December 2016 2/30

  4. Occasion Contamination in North Rhine-Westphalia (NRW) drinking water rivers Ruhr and Möhne affected by PFASs-polluted fertilizer prior to summer 2006 (very) high concentrations measured at water supply stations downstream motivated human biomonitoring studies Introduction Rathjens et al. , 2 December 2016 3/30

  5. Long-Term Objectives modelling state-wide PFASs exposure both temporal and spatial find regions and time periods of increased exposure use as explanatory variable for spatio-temporal NRW birth data explore/infer possible dependencies Current PFASs measurements from water supply stations and areas: irregular sampled, non-stationary time series find realistic estimation of mean- and var-function regression on time interpolation, smoothing extrapolation Introduction Rathjens et al. , 2 December 2016 4/30

  6. Introduction 1 Data 2 Model Development 3 Regression Kernel Smoothing Categorization Results 4 Regression Kernel Smoothing Conclusions and Outlook 5 Data Rathjens et al. , 2 December 2016 5/30

  7. Drinking Water PFASs Data provided by NRW state environmental agency LANUV drinking water samples from water supply stations and network Spatial Structure stations (data from ca. 250 of 650 stations) water supply areas (data from ca. 200 of 450 areas) complex assignment rivers Temporal Structure irregular time series (ca. one value per month or less) from summer 2006; partially ongoing Data Rathjens et al. , 2 December 2016 6/30

  8. Maximum PFOA 2006–2014 [ng/l] Data Rathjens et al. , 2 December 2016 7/30

  9. Characteristics irregular sampling non-stationary different patterns need for extrapolation values < LoQ (varying) extremely high values (possible) change points Data Rathjens et al. , 2 December 2016 8/30

  10. Measurement Errors additional variability depending on scale due to serial dilution in chemical analysis coefficient of variation ( c v ) ca. 20% Data Rathjens et al. , 2 December 2016 9/30

  11. Introduction 1 Data 2 Model Development 3 Regression Kernel Smoothing Categorization Results 4 Regression Kernel Smoothing Conclusions and Outlook 5 Model Development Rathjens et al. , 2 December 2016 10/30

  12. Time-Dependent Regressions Observations and Goal measurements ( x i ) i = 1 ,..., n , positive simple regression, e.g. ln ( x ) = b 0 + b 1 t + ǫ , too restrictive at times t 1 ≤ . . . ≤ t n , unequally spaced segmented according to change points (if known) estimate process X = f ( t ) (or distribution / posterior predictive) P-splines for arbitrary time t Model Development: Regression Rathjens et al. , 2 December 2016 11/30

  13. Conjugate Γ - Γ -Model X t | β t ∼ Γ( α, β t ) for arbitrary t fixed α representing measurement error: √ √ Var ( X ) α/β 2 1 c v = = = E ( X ) α/β √ α e.g. c v = 0 . 2 ⇒ α = 25 prior β t ∼ Γ( θ t , η t ) alternative to log N -Model (no transformation) Model Development: Kernel Smoothing Rathjens et al. , 2 December 2016 12/30

  14. „Weighted Posterior“ θ t → θ t + α n � n i = 1 w i η t → η t + n � n i = 1 x i w i Weights w i ∈ [ 0 , 1 ] with w := � n i = 1 w i ∈ [ 0 , 1 ] from kernel, e.g. Gaussian: w i = f N ( t ,δ 2 ) ( t i ) w ց 0: no informative data, retain prior w ր 1: data „almost everywhere“, usual Γ - Γ -update with ˜ x i := nw i x i smoothing parameter δ depending on t -scale/resolution Model Development: Kernel Smoothing Rathjens et al. , 2 December 2016 13/30

  15. Prior Choices empiral Bayes from whole sample vague, e.g. with prior predictive ∈ [ 0 , 1000 ] informative: high values prior to 2006 (for 2 to 4 years) afterwards decrease sequential: adjacent time’s posterior Model Development: Kernel Smoothing Rathjens et al. , 2 December 2016 14/30

  16. Categorical Data simplify x i ’s to ordinal categories such as „not detected“, „low“, „increased“, „very high“ natural incorporation of values < LoQ magnitude less likely to be „false“ then value with measurement error useful as an epidemiological predictor Future Approaches ◮ model risk/rate of, e.g., increased or (cumulative) probit or logit P-splines very high values for a period of time simple case: binary regression use known change points Model Development: Categorization Rathjens et al. , 2 December 2016 15/30

  17. Introduction 1 Data 2 Model Development 3 Regression Kernel Smoothing Categorization Results 4 Regression Kernel Smoothing Conclusions and Outlook 5 Results Rathjens et al. , 2 December 2016 16/30

  18. Simple Regression Results: Regression Rathjens et al. , 2 December 2016 17/30

  19. Spline Regression Results: Regression Rathjens et al. , 2 December 2016 18/30

  20. Empirical Bayes Prior Results: Kernel Smoothing Rathjens et al. , 2 December 2016 19/30

  21. Vague Prior with Large δ Results: Kernel Smoothing Rathjens et al. , 2 December 2016 20/30

  22. Informative Prior Results: Kernel Smoothing Rathjens et al. , 2 December 2016 21/30

  23. Sequential Prior with Uniform Kernel Results: Kernel Smoothing Rathjens et al. , 2 December 2016 22/30

  24. Introduction 1 Data 2 Model Development 3 Regression Kernel Smoothing Categorization Results 4 Regression Kernel Smoothing Conclusions and Outlook 5 Conclusions and Outlook Rathjens et al. , 2 December 2016 23/30

  25. Findings sufficient (local) fit to present data no useful extrapolation without specific prior knowledge difficult global modelling individual solution for each series preferable Evaluation non-parametric solutions to incorporate changes and nonlinear trends parametric regression useful if change points known kernel smoother able to include local prior information, if available sequential prior weighting down important maxima too little variance for data ց 0 Conclusions and Outlook Rathjens et al. , 2 December 2016 24/30

  26. Model Enhancement estimate Γ -parameter α (globally, restricted) distinguish error- and process-related variability tune smoothing parameter δ respect „data density“ in time locally adapted appropriate for chosen prediction interval (daily, monthly, . . . ) asymmetric weighting, e.g. f ( t − a ) < f ( t + a ) several series: Conclusions and Outlook Rathjens et al. , 2 December 2016 25/30

  27. Supply Stations and Areas one station may supply several areas one area may be supplied by several stations different periods observed measurements from stations and network: supply from the Ruhr Conclusions and Outlook Rathjens et al. , 2 December 2016 26/30

  28. Network Samples important for estimation of areas’ contamination verification of single station models unknown water source possibly mixture of stations’ waters, e.g.: X ( t ) = π 1 X 1 ( t ) + π 2 X 2 ( t ) + π 3 X 3 ( t ) Conclusions and Outlook Rathjens et al. , 2 December 2016 27/30

  29. Spatial Dependence water from contaminated rivers, esp. the Ruhr other spatial processes (e.g., for groundwater) modelling X = f ( t , s ) two-dimensional Ruhr models (river × time) discrete space: supply areas (use, e.g., GMRF) Conclusions and Outlook Rathjens et al. , 2 December 2016 28/30

  30. Modelling Internal Exposure extremely slow decrease after exposure very weak effects of random fluctuations What is important? find times and values of exposure peaks (short-term) model the sum of subsequent exposures correctly (no loss by averaging, long-term) background exposure (equilibrium, long-term) Conclusions and Outlook Rathjens et al. , 2 December 2016 29/30

  31. Thanks to . . . the NRW state environmental agency LANUV for providing water data Stiftung Mercator for funding our work all co-workers and participants Acknowledgements Rathjens et al. , 2 December 2016 30/30

Recommend


More recommend