chapter 7 1 se sequential data data
play

Chapter 7-1: Se Sequential Data Data Jilles Vreeken Revision 1, - PowerPoint PPT Presentation

Chapter 7-1: Se Sequential Data Data Jilles Vreeken Revision 1, November 26 th Definition of smoothing clarified IRDM 15/16 24 Nov 2015 IRDM Chapter 7, overview Time Series Basic Ideas 1. Prediction 2. Motif Discovery 3.


  1. Chapter 7-1: Se Sequential Data Data Jilles Vreeken Revision 1, November 26 th Definition of smoothing clarified IRDM ‘15/16 24 Nov 2015

  2. IRDM Chapter 7, overview  Time Series Basic Ideas 1. Prediction 2. Motif Discovery 3.  Discrete Sequences Basic Ideas 4. Pattern Discovery 5. Hidden Markov Models 6. You’ll find this covered in Aggarwal Ch. 3.4, 14, 15 VII-1: 2 IRDM ‘15/16

  3. IRDM Chapter 7, today  Time Series Basic Ideas 1. Prediction 2. Motif Discovery 3.  Discrete Sequences Basic Ideas 4. Pattern Discovery 5. Hidden Markov Models 6. You’ll find this covered in Aggarwal Ch. 3.4, 14, 15 VII-1: 3 IRDM ‘15/16

  4. Chapter 7.1: Basi asic I Ideas eas Aggarwal Ch. 14.1-14.2 VII-1: 4 IRDM ‘15/16

  5. T emperature Data Temp (°C) 28.2 25.4 30.5 15.7 33.4 29.4 28.6 16.1 28.5 27.9 15.5 31.4 VII-1: 5 IRDM ‘15/16

  6. T emperature Data Time Temp (°C) Daily Temperature June-15 28.2 40 35 June-16 25.4 30 25 June-17 30.5 20 15 June-18 15.7 10 June-19 33.4 5 0 June-20 29.4 June-22 28.6 June-23 16.1 June-24 28.5 June-25 27.9 June-26 15.5 June-27 31.4 VII-1: 6 IRDM ‘15/16

  7. T emperature Data Time Temp (°C) Daily Temperature June-15 28.2 40 35 June-16 25.4 30 25 June-17 30.5 20 15 June-18 15.7 10 June-19 33.4 5 0 June-20 29.4 June-22 28.6 June-23 16.1 June-24 28.5 June-25 27.9 June-26 15.5 June-27 31.4 VII-1: 7 IRDM ‘15/16

  8. T emperature Data Time Temp (°C) Daily Temperature June-15 28.2 40 35 June-16 25.4 30 25 June-17 30.5 20 15 Sept-18 15.7 10 June-19 33.4 5 0 June-20 29.4 June-22 28.6 Sept-23 16.1 Sept-24 28.5 June-25 27.9 Sept-26 15.5 June-27 31.4 VII-1: 8 IRDM ‘15/16

  9. Applications Healt lth Monit itorin ing Stock a analy lysis is Weathe her Forecasting ing VII-1: 9 IRDM ‘15/16 Socia ial Network Analysis is

  10. Definition A time s e seri eries of len engt gth 𝑜 consists of 𝑜 tuples 𝑢 1 , 𝑌 1 , 𝑢 2 , 𝑌 2 , … ( 𝑢 𝑜 , 𝑌 𝑜 ) where for a tuple ( 𝑢 𝑗 , 𝑌 𝑗 ) , 𝑢 𝑗 is the ti time s stam tamp, and 𝑌 𝑗 is the data ata at time 𝑢 𝑗 , and we have a total order on the time stamps 𝑢 1 < 𝑢 2 < ⋯ < 𝑢 𝑜 Length may either be finite or infinite  Time stamps may be contiguous, in practice integers are easier  Data when talking about time series, usually numeric, continuous real eal-val alued ed  may be univariate (one attribute) or multivariate (multiple attributes)  VII-1: 10 IRDM ‘15/16

  11. Probabilistic Model of Time Series Consider data 𝑌 𝑗 at time 𝑢 𝑗 as a random variable the actual data we observe at 𝑢 𝑗 is a realiza zati tion of 𝑌 𝑗  Some probabilistic properties can be stable le over time e.g. the mean 𝜈 𝑗 of 𝑌 𝑗 does not change (much)  the covariance between pairs ( 𝑌 𝑗 , 𝑌 𝑗+ℎ ) is (almost) the same as ( 𝑌 1 , 𝑌 1+ℎ ) , i.e.,  the autoc ocovar arian ance of 𝑌 𝑗 does not change (much) A time series is stationa nary if the process behind it doe oes s not ot change  𝜈 𝑢 = 𝜈 𝑡 = 𝜈 for all 𝑢 , 𝑡 , and  𝐷 𝑌𝑌 𝑢 , 𝑡 = 𝐷 𝑌𝑌 𝑡 − 𝑢 = 𝐷 𝑌𝑌 ( 𝜐 ) where 𝜐 = | 𝑡 − 𝑢 | is the amount of time by which the signal is shifted Stationary time series are easy to model and predict  most real-world time series, however, are anything but stationary (recall, if 𝑌 𝑗 has mean 𝜈 𝑗 = 𝐹 [ 𝑌 𝑗 ] , 𝐷 𝑌𝑌 𝑢 , 𝑡 = 𝑑𝑑𝑑 𝑌 𝑢 , 𝑌 𝑡 = 𝐹 𝑌 𝑢 𝑌 𝑡 − 𝜈 𝑢 𝜈 𝑡 ) VII-1: 11 IRDM ‘15/16

  12. Stationarity of Time Series Daily Temperature 40 30 20 10 0 Monthly Temperature 40 30 20 10 0 VII-1: 12 IRDM ‘15/16

  13. Seasonality & trend Monthly Temperature 40 35 30 25 20 15 10 5 0 2011 2012 2013 VII-1: 13 IRDM ‘15/16

  14. Formulation Classically, we assume a time series 𝑌 is composed of 𝑌 𝑗 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗 + 𝑢𝑢𝑡𝑜𝑒 𝑗 + 𝑜𝑑𝑡𝑡𝑡 𝑗 where 𝑜𝑑𝑡𝑡𝑡 𝑗 is stationary. To make 𝑌 stationary, we simply have to remove seasonality and trend. VII-1: 14 IRDM ‘15/16

  15. Seasonality Seasonality is essentially perio iodici icity  seasonality is a perio iodic ic functio ion n of time with period 𝑒 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗−𝑒 How to find the seasonal ality f ty functi tion? by fitting a sine e or cosi osine function 1. difficult – the signal may also be sine’ish by di diffe fferen encing 2. 𝑌 𝑗 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗 + 𝑢𝑢𝑡𝑜𝑒 𝑗 + 𝑜𝑑𝑡𝑡𝑡 𝑗 𝑌 𝑗−𝑒 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗−𝑒 + 𝑢𝑢𝑡𝑜𝑒 𝑗−𝑒 + 𝑜𝑑𝑡𝑡𝑡 𝑗−𝑒 VII-1: 15 IRDM ‘15/16

  16. Seasonality Seasonality is essentially perio iodici icity  seasonality is a perio iodic ic functio ion n of time with period 𝑒 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗−𝑒 How to find the seasonal ality f ty functi tion? by fitting a sine e or cosi osine function 1. difficult – the signal may also be sine’ish by di diffe fferen encing 2. 𝑌 𝑗 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗 + 𝑢𝑢𝑡𝑜𝑒 𝑗 + 𝑜𝑑𝑡𝑡𝑡 𝑗 𝑌 𝑗−𝑒 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗−𝑒 + 𝑢𝑢𝑡𝑜𝑒 𝑗−𝑒 + 𝑜𝑑𝑡𝑡𝑡 𝑗−𝑒 ′ = 𝑌 𝑗 − 𝑌 𝑗−𝑒 𝑌 𝑗 VII-1: 16 IRDM ‘15/16

  17. ′ = 𝑌 𝑗 − 𝑌 𝑗−𝑒 where d = 12 𝑌 𝑗 Monthly Temperature 40 35 30 25 20 15 10 5 0 2011 2012 2013 VII-1: 17 IRDM ‘15/16

  18. Example: Removing Seasonality Monthly Temperature 40 35 30 This is the time series we obtained by 25 removing seasonality 20 15 10 5 0 VII-1: 18 IRDM ‘15/16

  19. Trend Trend is a pol olynom nomial f func unction on of time (assumption) How to find the trend function? by fit itting ing functio ions ns 1.  difficult to do, up to what order, when to stop? by di diffe fferen encing 2. ′ = 𝑌 𝑗 − 𝑌 𝑗−1 𝑌 𝑗 ′′ = 𝑌 𝑗 ′ − 𝑌 𝑗−1 ′ 𝑌 𝑗  usually 2 times is enough VII-1: 19 IRDM ‘15/16

  20. Example: Removing Trend Monthly Temperature 40 35 30 This is the time series we obtained by 25 removing seasonality 20 15 10 5 0 VII-1: 20 IRDM ‘15/16

  21. Example: Removing Trend ′ = 𝑌 𝑗 − 𝑌 𝑗−1 𝑌 𝑗 Monthly Temperature 40 35 30 25 This is the time series we obtained by 20 removing seasonality and trend 15 10 5 0 -5 VII-1: 21 IRDM ‘15/16

  22. Example: Removing Trend ′ = 𝑌 𝑗 − 𝑌 𝑗−1 𝑌 𝑗 Monthly Temperature 40 35 30 25 The left-over fluctuations are either 20 noise or non-trivial patterns 15 10 5 0 -5 VII-1: 22 IRDM ‘15/16

  23. Pre-processing We can infer missing values by interpolation 𝑌 𝑙 = 𝑌 𝑗 + 𝑢 𝑙 − 𝑢 𝑗 𝑘 − 𝑌 𝑗 ) × ( 𝑌 𝑢 𝑘 − 𝑢 𝑗 where 𝑢 𝑗 < 𝑢 𝑙 < 𝑢 𝑘 VII-1: 23 IRDM ‘15/16

  24. Pre-processing We can infer missing values by interpolation 𝑌 𝑙 = 𝑌 𝑗 + 𝑢 𝑙 − 𝑢 𝑗 𝑘 − 𝑌 𝑗 ) × ( 𝑌 𝑢 𝑘 − 𝑢 𝑗 where 𝑢 𝑗 < 𝑢 𝑙 < 𝑢 𝑘 Temperature on June-22: Time Temp (°C) 1 June-19 33.4 𝑌 4 = 𝑌 2 + 𝑢 4 − 𝑢 2 × 𝑌 5 − 𝑌 2 2 June-20 29.4 𝑢 5 − 𝑢 2 4 June-22 4−2 = 29.4 + 5−2 × 16.1 − 29.4 5 June-23 16.1 = 20.5 VII-1: 24 IRDM ‘15/16

  25. Smoothing We can remove noise by smoot oothin ing Standard options include avera veraging ng ′ = 𝑡𝑑𝑏 ( 𝑌 𝑗−𝑥 , … , 𝑌 𝑗 ) 𝑌 𝑗 where win window ow le length 𝑥 is a user-specified parameter We can more weight to recent values by exponent nential s smoothi hing 𝑗 ′ = 1 − 𝛽 𝑗 ⋅ 𝑌 0 ′ + 𝛽 � 𝑌 𝑘 ⋅ 1 − 𝛽 𝑗−𝑘 𝑌 𝑗 𝑘=1 where the user chooses decay factor 𝛽 (updated on Nov 26 th : we now average explicitly over past values) VII-1: 25 IRDM ‘15/16

  26. Chapter 7.2: Forec ecast sting ing Aggarwal Ch. 14.3 VII-1: 26 IRDM ‘15/16

  27. Principle of Forecasting If we wish to make predictions, then clearly we must assu assume that something is stab stable over time. VII-1: 27 IRDM ‘15/16

  28. Autoregressive (AR) model Future values depend on past ast va values + random noise  assumption: the time series depends on autocorrelation Which past values?  the 𝑥 immedi diatel ely previous values What relation between past and future?  linear combination What kind of noise?  Gaussian VII-1: 28 IRDM ‘15/16

  29. AR, formally Future value is a linear combination of past ast va values + white noise 𝑥 + 𝑑 + 𝜗 𝑢 𝑌 𝑢 = � 𝑡 𝑗 ⋅ 𝑌 𝑢−𝑗 𝑗=1 noi noise with shifted mean Linear combination of past v valu lues where 𝜗 𝑢 ~ 𝒪 (0, 𝜏 2 ) VII-1: 29 IRDM ‘15/16

Recommend


More recommend