An Exploratory Segmentation Method for Time Series Christian Derquenne EDF R&D
Outline Issues and motivations The proposed method Application : a simulated case Contributions, applications and further researches 24 th September 2010 COMPSTAT 2010
Issues and motivations Decomposition of times series Decomposition of times series → Trend, Trend, seasonality, seasonality, volatility volatility and noise and noise More less regular with respect application case More less regular with respect application case Evolution of electric consumption for 50 years Evolution of electric consumption for 50 years ⇒ Regular phenomena Regular phenomena ⇒ forecasting model at short term (MAPE < 1,5%) forecasting model at short term (MAPE < 1,5%) Evolution of Evolution of financial series (CAC40, S&P 500, …) financial series (CAC40, S&P 500, …) ⇒ Trend and seasonality occur less Trend and seasonality occur less regularly and less regularly and less frequently frequently ⇒ Volatility and irregularly Volatility and irregularly ⇒ Behaviors breaks could characterize series (peaks, level breaks, trend Behaviors breaks could characterize series (peaks, level breaks, trend changes changes, volatility) , volatility) ⇒ The data modeling is very delicate, to The data modeling is very delicate, to forecast these series can be close to forecast these series can be close to an utopian view an utopian view 24 th September 2010 COMPSTAT 2010
Issues and motivations Interest to detect be Interest to detect behavior breakpoints havior breakpoints → Building contiguous segments (segmentation) Building contiguous segments (segmentation) → Interesting to detect behavior breakpoints Interesting to detect behavior breakpoints → Achieving stationarity Achieving stationarity of ti of time series with a segmentation model me series with a segmentation model → Building symbolic curves to cluster series Building symbolic curves to cluster series → Modeling multivari Modeling multivariate time series te time series Potent ntial ap ial applicatio lications ns → Economics, finance, human sequence, meteorology, energy Economics, finance, human sequence, meteorology, energy management, etc. management, etc. 24 th September 2010 COMPSTAT 2010
Issues and motivations Some examples of methods Some examples of methods → Exploring the segmentation space for the assessment of Exploring the segmentation space for the assessment of multiple change multiple change-point point models mo dels [Guédon, Y [Guédon, Y. (2008)] (2008)] → Inference on the models with multiple breakpoints in multivariate time series, Inference on the models with multiple br eakpoints in multivariate time series, notably to select o notably to select optimal number timal number of of br breakpoints eakpoints [Lavielle, M Lavielle, M. et et al. (2006) al. (2006)] → Sequential change-point detection when Sequential change oint detection when the pre- the pre- and post-change and post-change parameters are parameters are unknown unknown [Lai, TL. et [Lai, TL. et al. (2009)] al. (2009)] Common point of these methods Common point of these methods → Using of dynamic programming to Using of dynamic programming to de decrea crease computation complexity of se computation complexity of 2 − segmentations (total numb segmentations (total number = ) er = ) 1 T → Co Complexit lexity is is ge gene nerally in lly in O ( ST ST 2 ) for the time and in ) for the time and in O ( ST ST ) fo for the the line linear ar clustered space, but clustered s ace, but also: also: O ( T T 2 ) a ) and O ( MT MT 2 ) where T = length of series ; S = number of segments ; M = number of de series 24 th September 2010 COMPSTAT 2010
Issues and motivations Three problems studied by these methods Three problems studied by these methods (i) Change mean wi (i) Change mean with a constant variance th a constant variance (ii) (ii) Change of variance with a constant mean Change of variance with a constant mean (iii) (iii) Change for overall distribution Change for overall distribution of time series without change of of time series without change of level, in dispersion and on the distribution of errors level, in dispersion and on the distribution of errors The proposed method The proposed method → Detection of increasing or decreasing trend [Perron Detection of increasing or decreasing trend [Perron & al al., 2008] ., 2008] → To reduce the computation To reduce the computation complexity in complexity in O ( KT KT ), where ), where K is is the the smoothing degree, which is generally les smoothing degree, which is generally less than to s than to T → Proposition of some solutions of segmentation containing Proposition of some solutions of segmentation containing segments with increasing or decreasing trend, constant level and segments with increasing or decre asing trend, constant level and different standard-deviatio different standard-deviations 24 th September 2010 COMPSTAT 2010
Outline Issues and motivations The proposed method Application : a simulated case Contributions, applications and further researches 24 th September 2010 COMPSTAT 2010
Proposed method Let ’ s ( Y t ) t= 1 ,T be a time series, we suppose that it is decomposed in accordance with an heteroskedastic linear model (or variance components) [Rao & al., 1988, Searle & al., 1992]: ( ) [ S ∑ = β ( s ) + β ( s ) + σ ε Y t 1 (1) ] t 0 1 s t t ∈ τ s s = 1 β σ ( s ) β ( s ) where , and > 0, are respectively the level, trend and standard- 0 s 1 is a N (0,1) deviation parameters for the segment τ s , and ε t S ∑ T s = T T s = card( τ s ) and then there are 3 S parameters to estimate and the s = 1 number S of segments Inference: Inference: OLS ; ML ; REML β β ( s ) → same solutions for and with the ( ) the three estimators three estimators s 0 1 σ → ML ML and REML REML estimate directly 2 s σ → Only REML REML provides an unbiased estimator of 2 s 24 th September 2010 COMPSTAT 2010
Proposed method Detailed process tailed process: preparing data preparing data Step of smoothing: Step of smoothing: To keep only the « To keep only the « strong strong » trends trends → Using moving median: Using moving median: m ( t ) = med ( y ) (2) [ ] j t t ∈ a ( t ), b ( t ) j j where for j (smoothing degree) fixed: a j ( t ) = t et b j ( t ) = t + j -1 où t =1 à T – j +1 Remark : The more j increases, the less irregularity of data is taken into account Remark A little example: A little example: Y t ~> N ( 5 ; 0,01) pour t = 1,40 Y t ~> N ( 6 ; 0,01) pour t = 41,100 24 th September 2010 COMPSTAT 2010
Proposed method Detailed process tailed process: preparing data preparing data Step of smoothing: Step of smoothing: To keep only the « To keep only the « strong strong » trends trends → Using moving median: Using moving median: m ( t ) = med ( y ) (2) [ ] j t t ∈ a ( t ), b ( t ) j j where for j (smoothing degree) fixed: a j ( t ) = t et b j ( t ) = t + j -1 où t =1 à T – j +1 Remark Remark : The more j increases, the less irregularity of data is taken into account 24 th September 2010 COMPSTAT 2010
Proposed method Detailed process tailed process: preparing data preparing data Differencing step: Differencing step: to detect the trends of smoothed data to detect the trends of smoothed data → Using a r Using a relative deviation: lative deviation: ( ) ( ) ( ) (3) d ( t ) = m ( t ) − m t − k m t − k j j j j where k = t – j /2 if j is even and k = t –( j +1)/2 if j is odd This This diffe differenc ncing ing mus must be be s sufficiently high fficiently high to to r reve veal tr al trend de deviatio viations ns, but but no not to too much otherwise it cou much otherwi e it could be skipped d be skipped Remark : it is only a visual choice and not a theoretical choice Remark 24 th September 2010 COMPSTAT 2010
Proposed method Detailed process tailed process: preparing data preparing data Step of counting: Step of counting: number and size of number and size of initial segments initial segments ( ) ∑ ( 0 ) ( 0 ) T = card τ = 1 (4) [ ] ( ) ( ) j , 1 j , 1 sign d ( t ) = sign d ( t − 1 ) j j t ≥ 2 ( ) ( ) S ∑ τ ( 0 ) τ ( 0 ) τ ( 0 ) ( 0 ) ( 0 ) ( 0 ) S segment segments: : with size ,... with size ,... T ,... T ,... T and and T ( 0 ) = T j , 1 j , s j , S j , 1 j , s j , S j , s s = 1 Justification: Justification: (i) the nb of values with the same sign is reasonably linked to the smoothing deg. (ii) The smaller smoothing degrees is, the smaller size of series of differences with same sign is 24 th September 2010 COMPSTAT 2010
Recommend
More recommend