R and Time Series Data Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting Yanchang Zhao Time Series Clustering Time Series RDataMining.com Classification http://www.rdatamining.com/ R Functions & Packages for 18 July 2011 Time Series Conclusions 1/42
Outline R and Time Series Data 1 R and Time Series Data Time Series Decomposition 2 Time Series Decomposi- tion Time Series Forecasting 3 Time Series Forecasting Time Series Time Series Clustering 4 Clustering Time Series Classification Time Series Classification 5 R Functions & Packages for Time Series R Functions & Packages for Time Series 6 Conclusions Conclusions 7 2/42
R R and Time Series Data a free software environment for statistical computing and Time Series graphics Decomposi- tion runs on Windows, Linux and MacOS Time Series Forecasting widely used in academia and research, as well as industrial Time Series applications Clustering Time Series over 3,000 packages Classification CRAN Task View: Time Series Analysis R Functions & Packages for http://cran.r-project.org/web/views/TimeSeries.html Time Series Conclusions 3/42
Time Series Data in R R and Time Series Data Time Series class ts Decomposi- tion represents data which has been sampled at equispaced Time Series points in time Forecasting Time Series frequency=7: a weekly series Clustering frequency=12: a monthly series Time Series Classification frequency=4: a quarterly series R Functions & Packages for Time Series Conclusions 4/42
Time Series Data in R > a <- ts(1:20, frequency=12, start=c(2011,3)) > print(a) R and Time Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Series Data 2011 1 2 3 4 5 6 7 8 9 Time Series Decomposi- 2012 11 12 13 14 15 16 17 18 19 20 tion Dec Time Series Forecasting 2011 10 Time Series 2012 Clustering Time Series > str(a) Classification Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8 R Functions & Packages for Time Series > attributes(a) Conclusions $tsp [1] 2011.167 2012.750 12.000 $class 5/42 [1] "ts"
Outline R and Time Series Data 1 R and Time Series Data Time Series Decomposition 2 Time Series Decomposi- tion Time Series Forecasting 3 Time Series Forecasting Time Series Time Series Clustering 4 Clustering Time Series Classification Time Series Classification 5 R Functions & Packages for Time Series R Functions & Packages for Time Series 6 Conclusions Conclusions 7 6/42
What is Time Series Decomposition R and Time Series Data To decompose a time series into components: Time Series Decomposi- tion Trend component: long term trend Time Series Seasonal component: seasonal variation Forecasting Time Series Cyclical component: repeated but non-periodic Clustering fluctuations Time Series Classification Irregular component: the residuals R Functions & Packages for Time Series Conclusions 7/42
Data AirPassengers Data AirPassengers : Monthly totals of Box Jenkins international airline passengers, 1949 to 1960. It has 144(=12 × 12) values. R and Time Series Data > plot(AirPassengers) Time Series Decomposi- tion Time Series Forecasting 600 Time Series Clustering 500 Time Series Classification AirPassengers 400 R Functions & Packages for Time Series 300 Conclusions 200 100 1950 1952 1954 1956 1958 1960 8/42 Time
Decomposition > apts <- ts(AirPassengers, frequency = 12) > f <- decompose(apts) R and Time > # seasonal figures Series Data > plot(f$figure,type="b") Time Series Decomposi- tion Time Series Forecasting ● ● 60 Time Series Clustering 40 ● Time Series 20 Classification ● f$figure R Functions & 0 ● ● Packages for ● Time Series −20 ● ● ● Conclusions ● −40 ● 2 4 6 8 10 12 Index 9/42
Decomposition > plot(f) Decomposition of additive time series R and Time Series Data 500 observed Time Series Decomposi- 300 tion 100 Time Series 450 Forecasting 350 trend Time Series 250 Clustering 150 Time Series Classification 40 seasonal R Functions & 0 Packages for −40 Time Series 60 Conclusions random 20 0 −40 2 4 6 8 10 12 Time 10/42
Outline R and Time Series Data 1 R and Time Series Data Time Series Decomposition 2 Time Series Decomposi- tion Time Series Forecasting 3 Time Series Forecasting Time Series Time Series Clustering 4 Clustering Time Series Classification Time Series Classification 5 R Functions & Packages for Time Series R Functions & Packages for Time Series 6 Conclusions Conclusions 7 11/42
Time Series Forecasting R and Time Series Data Time Series To forecast future events based on known past data Decomposi- tion E.g., to predict the opening price of a stock based on its Time Series Forecasting past performance Time Series Popular models Clustering Autoregressive moving average (ARMA) Time Series Classification Autoregressive integrated moving average (ARIMA) R Functions & Packages for Time Series Conclusions 12/42
Forecasting > # build an ARIMA model R and Time > fit <- arima(AirPassengers, order=c(1,0,0), Series Data Time Series + list(order=c(2,1,0), period=12)) Decomposi- tion > fore <- predict(fit, n.ahead=24) Time Series > # error bounds at 95% confidence level Forecasting > U <- fore$pred + 2*fore$se Time Series Clustering > L <- fore$pred - 2*fore$se Time Series > ts.plot(AirPassengers, fore$pred, U, L, Classification + col=c(1,2,4,4), lty = c(1,1,2,2)) R Functions & Packages for > legend("topleft", col=c(1,2,4), lty=c(1,1,2), Time Series + c("Actual", "Forecast", Conclusions + "Error Bounds (95% Confidence)")) 13/42
Forecasting Actual Forecast 700 Error Bounds (95% Confidence) R and Time Series Data 600 Time Series Decomposi- tion 500 Time Series Forecasting 400 Time Series Clustering Time Series 300 Classification R Functions & Packages for 200 Time Series Conclusions 100 1950 1952 1954 1956 1958 1960 1962 Time 14/42
Outline R and Time Series Data 1 R and Time Series Data Time Series Decomposition 2 Time Series Decomposi- tion Time Series Forecasting 3 Time Series Forecasting Time Series Time Series Clustering 4 Clustering Time Series Classification Time Series Classification 5 R Functions & Packages for Time Series R Functions & Packages for Time Series 6 Conclusions Conclusions 7 15/42
Time Series Clustering R and Time To partition time series data into groups based on Series Data similarity or distance , so that time series in the same Time Series Decomposi- cluster are similar tion Measure of distance/dissimilarity Time Series Forecasting Euclidean distance Time Series Manhattan distance Clustering Maximum norm Time Series Classification Hamming distance R Functions & The angle between two vectors (inner product) Packages for Dynamic Time Warping (DTW) distance Time Series ... Conclusions 16/42
Dynamic Time Warping (DTW) DTW finds optimal alignment between two time series. > library(dtw) > idx <- seq(0, 2*pi, len=100) R and Time Series Data > a <- sin(idx) + runif(100)/10 Time Series Decomposi- > b <- cos(idx) tion > align <- dtw(a, b, step=asymmetricP1, keep=T) Time Series Forecasting > dtwPlotTwoWay(align) Time Series Clustering 1.0 Time Series Classification 0.5 R Functions & Query value Packages for Time Series 0.0 Conclusions −0.5 −1.0 0 20 40 60 80 100 17/42 Index
Synthetic Control Chart Time Series The dataset contains 600 examples of control charts R and Time synthetically generated by the process in Alcock and Series Data Manolopoulos (1999). Time Series Decomposi- Each control chart is a time series with 60 values. tion Time Series Six classes: Forecasting 1-100 Normal Time Series 101-200 Cyclic Clustering 201-300 Increasing trend Time Series Classification 301-400 Decreasing trend R Functions & 401-500 Upward shift Packages for Time Series 501-600 Downward shift Conclusions http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_ control.html 18/42
Synthetic Control Chart Time Series R and Time Series Data > # read data into R Time Series > # sep="": the separator is white space, i.e., one Decomposi- tion > # or more spaces, tabs, newlines or carriage returns Time Series > sc <- read.table("synthetic_control.data", Forecasting + header=F, sep="") Time Series Clustering > # show one sample from each class Time Series > idx <- c(1,101,201,301,401,501) Classification > sample1 <- t(sc[idx,]) R Functions & Packages for > plot.ts(sample1, main="") Time Series Conclusions 19/42
Six Classes 36 30 34 32 20 301 1 30 R and Time 10 Series Data 28 26 Time Series 0 Decomposi- 24 45 tion 45 Time Series 40 35 Forecasting 101 401 35 Time Series 25 Clustering 30 15 Time Series 25 Classification 35 45 R Functions & 30 Packages for 40 25 Time Series 201 501 35 20 Conclusions 15 30 10 25 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time Time 20/42
Recommend
More recommend