automatic trend extraction and forecasting for a family
play

Automatic trend extraction and forecasting for a family of time - PowerPoint PPT Presentation

Automatic trend extraction and forecasting for a family of time series Theodore Alexandrov, Nina Golyandina theo@pdmi.ras.ru, nina@ng1174.spb.edu St.Petersburg State University, Russia International Symposium on Forecasting, 12 June 2006


  1. Automatic trend extraction and forecasting for a family of time series Theodore Alexandrov, Nina Golyandina theo@pdmi.ras.ru, nina@ng1174.spb.edu St.Petersburg State University, Russia International Symposium on Forecasting, 12 June 2006 http://www.pdmi.ras.ru/ ∼ theo/autossa/ – p. 1/19

  2. History – Tasks – Advantages Origins of the “Caterpillar”-SSA approach � (Broomhead) Dynamic Systems, method of delays for analysis of attractors [middle of 80’s], Singular System Analysis � Singular Spectrum Analysis (Vautard, Ghil, Fraedrich) Geophysics/meteorology – signal/noise enhancing, signal detection in red noise (Monte Carlo SSA) [90’s] , � “Caterpillar” (Danilov, Zhigljavsky, Solntsev, Nekrutkin, Golyandina) Principal Component Analysis for time series [end of 90’s] Tasks � Additive components extraction and forecast (trends, harmonics, exponential modulated harmonics) � Smoothing (self-adaptive linear filter) � Change-point detection � Handling of missed observations � Multichannel Advantages � Non-parametric and model-free � Handles non-stationary time series (actual constraints on time series will be described) � Suits for short time series, robust to noise model etc More information � AutoSSA: http://www.pdmi.ras.ru/˜theo/autossa/ � ”Caterpillar”-SSA: http://www.gistatgroup.com/cat/ http://www.pdmi.ras.ru/ ∼ theo/autossa/ – p. 2/19

  3. Goal � The “Caterpillar”-SSA method works very well in different applications � Trend extraction is one of its advantages, especially when trend is a finite dimension time series (linear combinations of exponentials, polynomials and harmonics) � Historically, the part of work is performed manually (visually) Trend as a slow varying deterministic additive component of a time series Our goal is to extract a trend automatically by means of the “Caterpillar”-SSA method http://www.pdmi.ras.ru/ ∼ theo/autossa/ – p. 3/19

  4. “Caterpillar”-SSA basic algorithm F N = F (1) + . . . + F ( m ) � Decomposes time series into sum of additive components: N N � Provides the information about each component   Algorithm f 0 f 1 . . . f N − L     f 1 f 2 . . . f N − L +1 1. Trajectory matrix construction:     X = F N = ( f 0 , . . . , f N − 1 ) , F N → X ∈ R L × K . .  ... ...  . .   . .   ( L – window length, parameter) f L − 1 f L . . . f N − 1 � λ j U j V T X j = j 2. Singular Value Decomposition (SVD): X = � X j λ j – eigenvalue, U j – e.vector of XX T , � V j – e.vector of X T X , V j = X T U j / λ j X ( k ) = � 3. Grouping of SVD components: j ∈ I k X j { 1 , . . . , d } = � I k , 4. Reconstruction by diagonal averaging: F (1) F ( m ) F N = � + . . . + � X ( k ) → � N N F ( k ) N 1) Does exist an SVD such that it forms sought for additive component & 2) how to group SVD components? http://www.pdmi.ras.ru/ ∼ theo/autossa/ – p. 4/19

  5. Example: trend and seasonality extraction Traffic fatalities. Ontario, monthly, 1960-1974 (Abraham, Redolter. Stat. Methods for Forecasting , 1983) Eigentriples group I ( T ) = { 1 , 4 , 5 } N = 180 , L = 60 Sequences of elements for each of the first eigen vectors U i , 1 � i � 9 , U i = ( u ( i ) 1 , . . . , u ( i ) L ) T http://www.pdmi.ras.ru/ ∼ theo/autossa/ – p. 5/19

  6. Automation of the choice of eigen triples Known attempts of automation � Trend and periodicity extraction: R.Vautard, P .Yiou, M.Ghil, 1992 (SSA-MTM Toolkit and KSpectraToolkit software) � Auto-denoising in case of big SNR: F.J.Alonso, J.M.Castillo, P .Pintado, 2004 (biomechanical kinematic signals) � Extraction of generalized cycle components: Izmailov, M.Hai, 2006 (compressors and refrigerators) Our approach � Methods for extraction and forecast of an additive components: • harmonics, exponential modulated harmonics extraction (based on the ideas of Vautard et al.) • trend (slow varying determenistic component) � Criteria for setting parameters of the methods � Technique of verification of the methods on given data Remarks on our approach � Based on consideration of singular vectors � Choice of parameters and verification procedure: for a set of time series http://www.pdmi.ras.ru/ ∼ theo/autossa/ – p. 6/19

  7. Problem statement: processing of a set of time series F = { F } – a data set of time series of length N All time series are of this model (it’s a general model, not a perametric one!) : F = F ( T ) + F ( R ) , where F ( T ) is a trend and F ( R ) is a residual (determenistic, noise) Problem: extraction and forecast of F ( T ) for every F ∈ F We propose 1. Method of choice of eigentriples 2. Verification of method on the data set 3. Setting of parameters of the method The item 2) requires similarity of time series of F . The more similar are time series, the more reliable are the results of the verification This solution inherits the non-parametric nature from the visual “Caterpillar”-SSA method http://www.pdmi.ras.ru/ ∼ theo/autossa/ – p. 7/19

  8. Method of identification of trend eigentriples Define the periodogram Π U of a vector U = ( u 1 , . . . , u L ) T as  2 ,  2 c 0 k = 0 ,  2 + s k Π L U ( k/L ) = L 2 , 1 � k � L − 1 c k , 2 2   2 , 2 c L/ 2 L is even and k = L/ 2 , where c k , s k are the coefficients of Fourier decomposition of elements of the vector U � � u n = c 0 + � + ( − 1) n c L/ 2 c k cos(2 πnk/L ) + s k sin(2 πnk/L ) 1 � k � L − 1 2 Periodogram value Π U ( ω ) reflects the contribution of a harmonic with frequency ω into the Fourier decomposition of u 1 , . . . , u L http://www.pdmi.ras.ru/ ∼ theo/autossa/ – p. 8/19

  9. Method of identification of trend eigentriples Trend – a slow varying determenistic additive component of time series Slow varying = harmonics with low frequencies dominate in Fourier decomposition L ) T have slow varying SVD components corresponding to a trend: their singular vectors U i = ( u ( i ) 1 , . . . , u ( i ) sequences of elements u ( i ) 1 , . . . , u ( i ) (theoretically proved fact) L The idea of identification is to find all singular vectors with slow varying sequences of elements For each U we calculate the contribution of harmonics with low frequencies into its F. decomposition: � 0 � ω � ω 0 Π U ( ω ) C ( U ) = 0 � ω � 0 . 5 Π U ( ω ) , ω ∈ k/L, k ∈ Z . � Trend eigentriples I ( T ) = { i : C ( U i ) � C 0 } for the given C 0 Parameters � ω 0 – prescribe low frequencies interval [0 , ω 0 ] , harmonics with frequencies from [0 , ω 0 ] are considered to be slow varying � C 0 – a threshold, 0 < C 0 < 1 http://www.pdmi.ras.ru/ ∼ theo/autossa/ – p. 9/19

  10. Trend extraction for all F ∈ F (with verification) Problem 1. Necessary conditions for using the “Caterpillar”-SSA: finite dimension of trend, moderate SNR, ... 2. Are they fulfilled for all F ∈ F ? Does the procedure extract trends with acceptable quality? We can verify if the method handles F by taking (at random) a test subset T ⊂ F Verification For every time series from the test subset G ∈ T G ( T ) (we suppose that � G ( T ) ≅ G ( T ) ) Manually (visually) extract trend � 1. Define the trend extracted using the procedure with threshold C 0 as � G ( T ) ( C 0 ) 2. � � G ( T ) − � G ( T ) ( C 0 ) Calculate C opt � � � = arg min l 2 , it extracts the trend which is the closest to the manually 0 C 0 extracted one. � � � 1 G ( T ) − � G ( T ) ( C opt � � � Estimate quality (on average) of operation of the procedure on F : 0 ) l 2 ♯ T G ∈T If it’s small enough we apply the procedure the all F ∈ F \ T http://www.pdmi.ras.ru/ ∼ theo/autossa/ – p. 10/19

  11. Trend extraction for all F ∈ F (with verification) Size of the test set T Size of the test set T depends on the level of similarity between all F from F It can be controlled in such a way: � � G ( T ) − � � � G ( T ) ( C 0 ) � � estimate the width of sampling confidence interval for l 2 , G ∈ T , � if it is small enough then T is sufficiently large. http://www.pdmi.ras.ru/ ∼ theo/autossa/ – p. 11/19

  12. Choice of parameters for approximation Low freq. interval boundary ω 0 Based on our understanding of low frequencies (which harmonic is to be considered as slow varying) � Examining the periodogram of F � There is a periodicity with period T (besides a trend) ⇒ ω 0 < 1 /T Threshold C 0 � � � F ( T ) − � but F ( T ) is unknown C opt F ( T ) ( C 0 ) � ∀ F ∈ F = arg min 0 C 0 � � � F ( T ) − � F ( T ) ( C 0 ) � , where We propose: exp ( R ( C 0 )) has the same behavior as � � F − � F ( T ) ( C 0 ) R ( C 0 ) = C , C ( F ) C ( F ) is the contribution of harmonics with low freq. into Fourier decomposition of F � � � F ( T ) − � � we can estimate C opt F ( T ) ( C 0 ) Because of similar behavior of R ( C 0 ) and for F from the R ( C 0 ) 0 http://www.pdmi.ras.ru/ ∼ theo/autossa/ – p. 12/19

Recommend


More recommend