introduction to time series data and analysis
play

Introduction to Time Series Data and Analysis Simon Taylor - PowerPoint PPT Presentation

Introduction to Time Series Data and Analysis Simon Taylor Department of Mathematics and Statistics 20 April 2016 Contents What is Time Series Data? Analysis Tools Trace Plot Auto-Correlation Function Spectrum Time Series Models Moving


  1. Introduction to Time Series Data and Analysis Simon Taylor Department of Mathematics and Statistics 20 April 2016

  2. Contents What is Time Series Data? Analysis Tools Trace Plot Auto-Correlation Function Spectrum Time Series Models Moving Average Auto-Regressive Further Topics

  3. What is Time Series Data? A time series is a set of observations made sequentially through time. Examples: • Changes in execution time, RAM or bandwidth usage. • Times a software has run in consecutive periods of time. • Financial, geophysical, marketing, demographic, etc. The objectives in time series analysis are: Description How does the data vary over time? Explanation What causes the observed variation? Prediction What are the future values of the series? Control Aim to improve control over the process.

  4. Common Questions Q: How important is preserving 5 data order? 1Hz 0 −5 A: Very! Changing data order breaks the dependence 5 between measurements. 10Hz 0 −5 Q: How frequent do I need to take measurements? 5 A: It depends: 100Hz 0 −5 • Too sparse, risk missing the dependence structure. 0 10 20 30 40 50 Time • Too frequent, swamped with noise. Figure: Sampling Frequency

  5. Why is time series important in benchmarking? 3 2 Q: Can I use simple summary 1 X 0 statistics? −1 −2 −3 A: You can, but they only 3 2 describe overall properties. 1 Y 0 −1 −2 Q: Can’t I just interpolate −3 3 between data points? 2 1 Z 0 A: Signals are often subject to −1 −2 uncontrollable random noise. −3 0 200 400 600 800 1000 Time Error from interpolation may be large if noise is large. Figure: Three times series with x = 0 and s 2 = 1. ¯

  6. Analysis Tools – Trace Plot A trace plot is a graph of the measurements against time. Easy to visually identify key features: • Trends – Long-term trend in the mean level. • Seasonality – Regular peaks & falls in the measurements. • Outliers – Unusual measurements that are inconsistent with the rest of the data. • Discontinuities – Abrupt change to the underlying process.

  7. Analysis Tools – Auto-correlation function Correlation measures the linear dependence between two data sets. Auto-correlation measures the correlation between all data pairs at lag k apart. � T − k t = 1 ( x t − ¯ x )( x t + k − ¯ x ) r k = , ( T − 1 ) s 2 x and s 2 is the sample where ¯ mean and variance. Figure: Lag 5 ACF calculation.

  8. Analysis Tools – Spectrum X The spectrum describes how 1/8 the power in a time series 1/7 varies across frequencies. 1/6 1/5 2 � T � I ( ω ) = 1 1/4 � � � x t e i 2 π t ω , � � π T 0 10 20 30 40 50 � � Time � � t = 1 50 40 Spectrum 30 for ω ∈ ( 0 , 1 / 2 ] . 20 10 0 Identifies prominent seasonal 0.0 0.1 0.2 0.3 0.4 0.5 Frequency and cyclic variation. Figure: Fourier decomposition and spectrum of time series X t .

  9. Time series models Let X 1 : T = { X 1 , . . . , X T } denote a sequence of T measurements. A time series is stationary if the distribution of any pair of subset separated by lag k , X 1 : t and X 1 + k , t + k , are the same. A time series is weakly stationary if the first two moments are constant over time: E [ X t ] = µ Cov ( X t , X t + k ) = γ ( k ) . and Gaussian White Noise Process, GWNP The time series { Z t } follows a Gaussian white noise process if: Z t ∼ N ( 0 , σ 2 ) , t = 1 , . . . , T

  10. Gaussian White Noise Process Figure: Gaussian white noise process.

  11. MA( q ) process Moving Average Process of Order q , MA( q ) The process { X t } is a moving average process of order q if: X t = β 0 Z t + β 1 Z t − 1 + · · · + β q Z t − q where { Z t } is a GWNP and β 0 , . . . , β q are constants ( β 0 = 1). Expectation: E [ X t ] = 0. Auto-covariance: � σ 2 � q −| k | β i β i + | k | , | k | = 0 , . . . , q ; Cov ( X t , X t + k ) = i = 0 0 , otherwise .

  12. MA( q ) process Figure: Left: MA(1), β 1 = 0 . 9. Right: MA(2), ( β 1 , β 2 ) = ( − 0 . 4 , 0 . 9 ) .

  13. AR( p ) process Autoregressive Process of Order p , AR( p ) The process { Y t } is an autoregressive process of order p if: Y t = α 1 Y t − 1 + · · · + α p Y t − p + Z t where { Z t } is a GWNP and α 1 , . . . , α p are constants. Expectation: E [ X t ] = 0. Auto-covariance for AR(1): α | k | Cov ( X t , X t + k ) = σ 2 1 , provided | α 1 | < 1 . 1 − α 2 1

  14. AR( p ) process Figure: Left: AR(1), α 1 = 0 . 9. Right: AR(2), ( α 1 , α 2 ) = ( 0 . 8 , − 0 . 64 ) .

  15. Non-stationary process Random Walk Non−stationary AR(2) 20 20 10 0 0 −10 −20 0 200 400 600 800 1000 0 200 400 600 800 1000 Time Time AR(1) w/ Mean Change Concatenated Haar MA 3 2 2 0 1 −2 0 −4 −1 −6 −2 −8 −3 0 200 400 600 800 1000 0 200 400 600 800 1000 Time Time Figure: Examples of non-stationary processes.

  16. On-going Research in Time Series nonlinear locally stationary processes bootstrap confidence interval maximum likelihood structural change forecasting Monte Carlo experiment spectral density Whittle likelihood ARFIMA heavy tail nonstationary time series ergodicity block bootstrap time series Kalman filtering Gaussian process ARMA stationarity outlier Portmanteau test random coefficients likelihood ratio test periodic time series changepoint seasonality estimation Markov chain VAR Stochastic volatility cointegration bilinear model CLT score test subsampling conditional heteroscedasticity robustness efficiency empirical distribution function consistency non−Gaussian kernel−density estimation INAR structural break heteroscedastic fractional integration MLE temporal aggregation wavelet QMLE AR parameter estimation unobserved components nonstationary MCMC infinite variance TAR model selection EM algorithm factor model unit roots ARCH least squares Additive outliers local Whittle ARIMA AIC residual autocorrelations fractional cointegration GARCH Bayesian inference VARMA spectral analysis Asymptotic distribution neural network least squares estimation count data smoothing goodness−of−fit high−frequency data state−space model Fisher information matrix Lagrange multiplier periodogram Brownian motion linear time series spectral density matrix integer−valued time series prediction simulation power MA nonlinear AR partial autocorrelation CUSUM test semiparametric estimation hypothesis testing deterministic trend RJMCMC stationary process nonparametric estimation threshold model spatio−temporal Dickey−Fuller test stationary test long memory point process Autocorrelation missing data Seasonal unit roots identification periodically correlated process multivariate time series Asymptotic normality unit root test Figure: Keyword cloud from the Journal of Time Series Analysis, 2002–2015. Red: Models, Navy: Properties, Grey: Inference & Methods

  17. Further Reading • Box, G. E. P ., Jenkins, G. M. and Reinsel, G. C. (2008) Time series analysis: Forecasting and control . 4th ed., John Wiley & Sons. • Chatfield, C. (2004) The Analysis of Time Series: An Introduction . 6th ed., CRC press. • Signal processing toolbox, MATLAB � ( http://uk.mathworks.com/products/signal/ ) • Statsmodels, python ( http://statsmodels.sourceforge.net/ )

  18. Simon Taylor Department of Mathematics and Statistics 20 April 2016

Recommend


More recommend