a survey
play

A SURVEY DENIS KHRYASHCHEV, GRADUATE CENTER, CUNY, OCTOBER 2018 - PowerPoint PPT Presentation

PATTERN DISCOVERY IN TIME SERIES A SURVEY DENIS KHRYASHCHEV, GRADUATE CENTER, CUNY, OCTOBER 2018 MOTIVATION Often datasets represent processes that take place over long periods of time. Their outputs are measured at regular time intervals


  1. PATTERN DISCOVERY IN TIME SERIES A SURVEY DENIS KHRYASHCHEV, GRADUATE CENTER, CUNY, OCTOBER 2018

  2. MOTIVATION Often datasets represent processes that take place over long periods of time. Their outputs are measured at regular time intervals creating discrete time series. For example, consider CitiBike demand and Fisher river temperature data. Fisher river CitiBike Data sources: 1. https://s3.amazonaws.com/tripdata/index.html, 2. https://datamarket.com/data/set/235d/mean-daily-temperature-fisher-river-near-dallas-jan-01-1988-to-dec-31-1991

  3. MOTIVATION. COMPLEXITY Complexity quantifies the internal structure of the underlying process. EEG data can be classified [1] into interictal, preictal and seizure using their complexity. interictal 1. voltage, µV preictal 1. voltage, µV seizure 1. voltage, µV [1] Petrosian, Arthur. "Kolmogorov complexity of finite sequences and recognition of different preictal EEG patterns." Computer-based medical systems, 1995., Proceedings of the Eighth IEEE Symposium on. IEEE, 1995.

  4. MOTIVATION. PERIODICITY Natural phenomena like Sun activity, Earth rotation and revolution drive periodic human activity on the large scale. E.g. New York City’s human mobility is highly periodic with clear peaks in ridership from 6 AM to 10 AM, and from 3 PM to 7 PM. Image source: http://web.mta.info/mta/news/books/docs/Ridership_Trends_FINAL_Jul2018.pdf

  5. MOTIVATION. PREDICTABILITY Predictability estimates the expected accuracy of forecasting given time series. Often there is a trade-off between the desired accuracy and computation time [2]. [2] Zhao, Kai, et al. "Predicting taxi demand at high spatial resolution: approaching the limit of predictability." Big Data (Big Data), 2016 IEEE International Conference on. IEEE, 2016.

  6. MOTIVATION. CLUSTERING Often a task of grouping similar in certain quality time series arises in the domains of transportation, finance, medicine, … Time sensitive modifications of standard techniques are applied, e.g. k-means of autocorrelation functions. autocorrelation functions time series clustered together Image source: Denis Khryashchev’s summer internship at Simulmedia (Jun – Aug 2018).

  7. MOTIVATION. FORECASTING Perhaps, the most well known and widely applied task related to time series is forecasting. Understanding time series periodicity, complexity, and predictability helps in selecting better predictors and optimizing parameters. E.g., knowing periodicity P=5 of the series, one can forecast averaging values with lag 5. Video source: Denis Khryashchev’s summer internship at Simulmedia (Jun – Aug 2018).

  8. NOTATION Throughout the presentation we will consider time series of real values and will use the following notation 𝑂 , 𝑌 𝑢 ∈ ℜ 𝑌 = 𝑌 1 , … , 𝑌 𝑂 = 𝑌 𝑢 1 Not to be confused with set notation, ⋅ is used to denote sequences. A subsequence of the series 𝑌 that starts at period 𝑗 and ends at period 𝑘 is written as 𝑘 = 𝑌 𝑢 𝑗 𝑘 = 𝑌 𝑗 , … , 𝑌 𝑌 𝑗 𝑘 , 𝑗 ≤ 𝑘

  9. ORGANIZATION OF THE PRESENTATION

  10. 1. KOLMOGOROV COMPLEXITY For time series 𝑌 we define the Kolmogorov complexity as the length of the shortest description of a sequence of values ordered in time in some fixed universal description language 𝐿 𝑌 = 𝑒 𝑌 where 𝐿 is the Kolmogorov complexity, and 𝑒(𝑌) is the shortest description of the time series X. Smaller values of 𝐿 𝑌 indicate lower complexity.

  11. 1. KOLMOGOROV COMPLEXITY. EXAMPLE Given two time series 𝑌 = 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 and 𝑍 = { 1, 0, 0, 1, 1, 1, 0, 0, 1, 0 } and selecting Python as our description language we have the shortest descriptions 𝑒 𝑄 𝑌 = 0, 1 ∗ 5 and 𝑒 𝑄 𝑍 = 1, 0, 0, 1, 1, 1, 0, 0, 1, 0 quantifying smaller “Pythonic” complexity for 𝑌 comparing to 𝑍 𝐿 𝑄 𝑌 = 𝑒 𝑄 𝑌 = 7 𝐿 𝑄 𝑍 = 𝑒 𝑄 𝑍 = 21

  12. 1. KOLMOGOROV COMPLEXITY. LIMITATIONS However, as proven by Kolmogorov in [3], and Chaitin and Arslanov in [4], the complexity 𝐿 is not a computable function in general. 3. Kolmogorov, Andrei N. "On tables of random numbers." Sankhyā: The Indian Journal of Statistics, Series A (1963): 369 -376. 4. G. J. Chaitin, A. Arslanov, and C. Calude , “Program - size complexity computes the halting problem,” Department of Computer Science, The University of Auckland, New Zealand, Tech. Rep., 1995.

  13. 1. LEMPEL-ZIV COMPLEXITY Lempel and Ziv [5] proposed a combinatorial approximation of the complexity of finite sequences based on their production history. For time series 𝑌 it is ℎ 1 +1 , 𝑌 ℎ 1 +1 ℎ 2 𝑛 𝐼 𝑌 = 𝑌 1 , … 𝑌 ℎ 𝑛−1 +1 For series 𝑌 = {0,0,0,1,1,0,1,0,0,1,0,0,0,1,0,1} one of the production histories is 𝐼 𝑌 = {0}ڂ{0,0,1}ڂ{1,0}ڂ{1,0,0}ڂ{1,0,0,0}ڂ{1,0,1} The overall complexity is the size of the shortest possible production history 𝑑 𝑌 = min 𝐼(𝑌) 𝐼 𝑌 Disadvantage : the actual values 𝑌 𝑢 are treated as symbols, e.g. 𝑑 𝑌 = 1, 2, 1, 5, 1, 2 = 𝑑(𝑍 = {8, 0.5, 8, 0.1, 8, 0.5}) 5. Lempel, Abraham, and Jacob Ziv. "On the complexity of finite sequences." IEEE Transactions on information theory 22.1 (1976): 75-81.

  14. 2. ENTROPY Shannon and Weaver introduced entropy [6] as a measure of information transmitted by a signal in a communication channel 𝐼 𝑌 = −𝔽 log 2 𝑄 𝑌 Renyi [7] generalized the definition for ordinary discrete finite distribution of 𝑌 𝒬 = 𝑞 1 , … , 𝑞 𝑁 , σ 𝑙 𝑞 𝑙 = 1 to entropy of order 𝛽 ( 𝛽 → 1 for Shannon entropy) 1 𝛽 1−𝛽 log 2 σ 𝑙 𝑞 𝑙 𝐼 𝛽 𝑌 = 𝐼 𝛽 𝒬 = Disadvantage: both definitions do not take order of the values 𝑌 𝑢 into account, e.g. 𝐼 𝑌 = 1, 2, 3, 1, 2, 3 = 𝐼(𝑍 = {1, 3, 2, 2, 3, 1}) . 6. Cover, Thomas M., and Joy A. Thomas. Elements of information theory. John Wiley & Sons, 2012. 7. Rényi, Alfréd. On measures of entropy and information. HUNGARIAN ACADEMY OF SCIENCES Budapest Hungary, 1961.

  15. 2. KOLMOGOROV ENTROPY Entropy is often used as an approximation of complexity. Among the most well-known approximations [8] of the complexity is Kolmogorov Entropy defined as 1 𝐿 = − lim 𝜐→∞ lim 𝜗→∞ lim 𝑒𝜐 ෍ 𝑞 𝑗 1 , … , 𝑗 𝑒 ln 𝑞 𝑗 1 , … , 𝑗 𝑒 𝑒→∞ 𝑗 1 ,…𝑗 𝑒 It describes complexity of a dynamic system with 𝐺 degrees of freedom. 𝐺 - dimensional phase space is partitioned into 𝜗 𝐺 boxes, 𝜐 stands for time intervals, and 𝑞 𝑗 1 , … , 𝑗 𝑒 is the joint probability that we find the 𝐺 -dimensional point representing values 𝑌 𝑢=𝑙𝜐 inside the box 𝜗 𝐺 . Disadvantage : the approximation is computable for known analytically defined models, however, it is hard to calculate it given the resulting series only. 8. Grassberger, Peter, and Itamar Procaccia. "Estimation of the Kolmogorov entropy from a chaotic signal." Physical review A 28.4 (1983): 2591.

  16. 2. ENTROPY WITH TEMPORAL COMPONENT Another definition [6] of entropy takes into account temporal order of the values 𝑌 𝑢 𝑘 log 2 𝑄 𝑌 𝑗 𝑘 𝑂 𝑂 𝐼 𝑢 𝑌 = − σ 𝑗=1 σ 𝑘=1 𝑄 𝑌 𝑗 𝑘 is the probability of the subsequence 𝑌 𝑗 𝑘 . 𝐼 𝑢 𝑌 is 𝑃 2 𝑂 complex. 𝑄 𝑌 𝑗 Lempel-Ziv estimator [9] approximates 𝐼 𝑢 𝑌 rapidly converging ′ −1 1 𝑂 σ 𝑢 𝑡 𝑢 𝐼 𝑀𝑎 𝑌 = ln 𝑂 ′ is the shortest subsequence starting at period 𝑢 observed for the 1 st time. where 𝑡 𝑢 Disadvantage: values 𝑌 𝑢 are treated as symbols, e.g. 𝐼 𝑀𝑎 𝑌 = 1, 2, 1, 5 = 𝐼 𝑀𝑎 (𝑍 = {2, 9, 2, 3}) 6. Cover, Thomas M., and Joy A. Thomas. Elements of information theory. John Wiley & Sons, 2012. 9. Kontoyiannis, Ioannis, et al. "Nonparametric entropy estimation for stationary processes and random fields, with applications to English text." IEEE Transactions on Information Theory 44.3 (1998): 1319-1327.

  17. 2. PERMUTATION ENTROPY Bandt and Pompe [10] proposed permutation entropy of order 𝑜 𝐼 𝑜 = − σ 𝑞 𝜌 log 𝑞 𝜌 # 𝑢|0≤𝑢≤𝑈−𝑜, 𝑢𝑧𝑞𝑓 𝑌 𝑢+1 ,…,𝑌 𝑢+𝑜 =𝜌 where 𝑞 𝜌 = is a frequency of a permutation of type 𝑈−𝑜+1 𝜌 . E.g., for 𝑌 = 4, 7, 9, 10, 6, 11, 3 , 𝑜 = 3 we have 𝜌 4, 7, 9 = 𝜌 7, 9, 10 = 𝜌 012 𝑌 𝑢 < 𝑌 𝑢+1 < 𝑌 𝑢+2 , 𝜌 9, 10, 6 = 𝜌 6, 11, 3 = 𝜌 210 𝑌 𝑢+2 < 𝑌 𝑢 < 𝑌 𝑢+1 , 𝜌 10, 6, 11 = 𝜌 102 𝑌 𝑢+1 < 𝑌 𝑢 < 𝑌 𝑢+2 . 2 2 1 1 The entropy becomes 𝐼 3 = −2 5 log 5 − 5 log 5 ≈ 1.52 . Disadvantage: the definition requires 𝑌 𝑢 ≠ 𝑌 𝑢+1 and has a complexity of 𝑃(𝑜!) . 10. Bandt, Christoph, and Bernd Pompe. "Permutation entropy: a natural complexity measure for time series." Physical review letters 88.17 (2002): 174102.

Recommend


More recommend