kernels to detect abrupt changes in time series
play

Kernels to detect abrupt changes in time series Alain Celisse 1 UMR - PowerPoint PPT Presentation

Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Kernels to detect abrupt changes in time series Alain Celisse 1 UMR 8524 CNRS - Universit e Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work


  1. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Kernels to detect abrupt changes in time series Alain Celisse 1 UMR 8524 CNRS - Universit´ e Lille 1 2 Modal INRIA team-project 3 SSB group – Paris joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot “Computational and statistical trade-offs in learning” – IHES Paris, March 22nd, 2016 1/47 Kernels to detect abrupt changes in time series Alain Celisse

  2. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Outline 1 Motivating examples and framework (kernels) 2 KCP Algorithm and computational complexity 3 Where are the change-points ( D fixed)? 4 How many change-points? 2/47 Kernels to detect abrupt changes in time series Alain Celisse

  3. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Change-point detection: 1-D signal (example) 1.2 Signal 1 Reg. func. 0.8 0.6 0.4 Signal 0.2 0 ? ? −0.2 −0.4 −0.6 −0.8 0 10 20 30 40 50 60 70 80 90 100 Position t 3/47 Kernels to detect abrupt changes in time series Alain Celisse

  4. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Detect abrupt changes. . . General purposes: 1 Detect changes in (features of) the distribution (not only in the mean) 4/47 Kernels to detect abrupt changes in time series Alain Celisse

  5. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Abrupt changes in high-order moments − → Detecting changes in the mean is useless 5/47 Kernels to detect abrupt changes in time series Alain Celisse

  6. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Detect abrupt changes. . . General purposes: 1 Detect changes in (features of) the distribution (not only in the mean) 2 Complex data: High-dimension: measures in R d , curves,. . . Structured: audio/video streams, graphs, DNA sequence,. . . 6/47 Kernels to detect abrupt changes in time series Alain Celisse

  7. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Motivating example 1: Structured objects Description: Video sequences from “Le grand ´ echiquier”, 70s-80s French talk show. At each time, one observes an image (high-dimensional). Each image is summarized by a histogram. 7/47 Kernels to detect abrupt changes in time series Alain Celisse

  8. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Motivating example 2: Structured objects Observe networks along the time Goal: Detect abrupt changes in some features of the network 8/47 Kernels to detect abrupt changes in time series Alain Celisse

  9. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Detect abrupt changes. . . General purposes: 1 Detect changes in (features of) the distribution (not only in the mean) 2 Complex data: High-dimension: measures in R d , curves,. . . Structured: audio/video streams, graphs, DNA sequence,. . . 3 Fusion of heterogeneous data Deal simultaneously with different types of complex data 4 Efficient algorithm allowing to deal with large data sets (“Big data” challenge) 9/47 Kernels to detect abrupt changes in time series Alain Celisse

  10. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? I Kernel framework 10/47 Kernels to detect abrupt changes in time series Alain Celisse

  11. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Kernel and Reproducing Kernel Hilbert Space (RKHS) X 1 , . . . , X n ∈ X : initial observations. k ( · , · ) : X × X → R : reproducing kernel (Aronszajn (1950)) H : RKHS associated with k ( · , · ) ( φ : X → H s.t. φ ( x ) = k ( x , · ): canonical feature map) Assets: Versatile tool to work with different types of data Complex data (high dimensional/structured) 11/47 Kernels to detect abrupt changes in time series Alain Celisse

  12. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Instances of kernels Gaussian kernel: (with R d -valued data) � � −� x − y � 2 k δ ( x , y ) = exp , δ > 0 . δ χ 2 -kernel: (with histogram-valued data) � � I � ( p i − q i ) 2 k I ( p , q ) = exp − · p i + q i i =1 . . . 12/47 Kernels to detect abrupt changes in time series Alain Celisse

  13. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Model Y i = φ ( X i ) = µ ⋆ ∀ 1 ≤ i ≤ n , i + ε i ∈ H , where µ ⋆ i ∈ H : mean element of P X i (distribution of X i ) � � � ε i � 2 ε i := Y i − µ ⋆ ∀ i , i , with E ε i = 0 , v i := E . H Mean element of P X i The mean element of P X i : ( H separable and E [ k ( X , X ) ] < + ∞ ) < µ ⋆ i , f > H = E X i [ < φ ( X i ) , f > H ] , ∀ f ∈ H . With characteristic kernels, µ ⋆ i � = µ ⋆ P X i � = P X j ⇒ j . 13/47 Kernels to detect abrupt changes in time series Alain Celisse

  14. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Estimation rather than identification Assumption n ) ′ ∈ H n : µ ⋆ = ( µ ⋆ 1 , . . . , µ ⋆ piecewise constant. Signal: Y 1 Reg. func. s 0.8 Fact: 0.6 With finite sample, it is 0.4 impossible to recover 0.2 change-point in noisy regions. 0 −0.2 Purpose: 55 60 65 70 75 80 85 90 95 100 Estimate µ ⋆ to recover change-points. Performance measure: � µ ⋆ − µ � 2 := � n i − µ i � 2 i =1 � µ ⋆ H 14/47 Kernels to detect abrupt changes in time series Alain Celisse

  15. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? II Algorithm 15/47 Kernels to detect abrupt changes in time series Alain Celisse

  16. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Notation Segmentation with D segments: τ = ( τ 0 , . . . , τ D ) , with 0 = τ 0 < τ 1 < τ 2 < · · · < τ D = n Quality of a segmentation τ : Following Hachaoui and Capp´ e (2007),   � n � D � τ ℓ � τ ℓ R n ( τ ) = 1 k ( X i , X i ) − 1 1 �   . k ( X i , X j ) n n τ ℓ − τ ℓ − 1 i =1 ℓ =1 i = τ ℓ − 1 +1 j = τ ℓ − 1 +1 Rk: With the linear kernel k ( x , x ′ ) = < x , x ′ > on X = R d , � R n ( τ ) reduces to the usual least-squares empirical risk. 16/47 Kernels to detect abrupt changes in time series Alain Celisse

  17. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? KCP Algorithm Input: observations: X 1 , . . . , X n ∈ X , kernel: k : X × X → R , 17/47 Kernels to detect abrupt changes in time series Alain Celisse

  18. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? KCP Algorithm Input: observations: X 1 , . . . , X n ∈ X , kernel: k : X × X → R , ∀ 1 ≤ D ≤ D max , compute: Step 1: � � � � τ ( D ) ∈ Argmin τ ∈T D R n ( τ ) n → dynamic programming � � ( τ 0 , . . . , τ D ) ∈ N D +1 / 0 = τ 0 < τ 1 < τ 2 < · · · < τ D = n T D n = 17/47 Kernels to detect abrupt changes in time series Alain Celisse

  19. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? KCP Algorithm Input: observations: X 1 , . . . , X n ∈ X , kernel: k : X × X → R , ∀ 1 ≤ D ≤ D max , compute: Step 1: � � � � τ ( D ) ∈ Argmin τ ∈T D R n ( τ ) n → dynamic programming Step 2: Find: � � � � D ∈ Argmin 1 ≤ D ≤ D max R n ( � τ ( D )) + pen ( � τ ( D )) → model selection � � � Output: sequence of change-points: � τ = � τ D . � � ( τ 0 , . . . , τ D ) ∈ N D +1 / 0 = τ 0 < τ 1 < τ 2 < · · · < τ D = n T D n = 17/47 Kernels to detect abrupt changes in time series Alain Celisse

  20. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Computational complexity (Naive approach) Dynamic programming (DP) update rule: ∀ 2 ≤ D ≤ D max , L D , n = min t ≤ n − 1 { L D − 1 , t + C t , n } , where L D − 1 , t : cost of the best segmentation in D − 1 segments up to time t , C t , n : cost of the segment 〚 t,n 〛 . t t t � � � 1 C s , t = k ( X i , X i ) − k ( X i , X j ) t − s i = s +1 i = s +1 j = s +1 Complexity (Naive approach): time: O ( D max n 4 ) (computation of { C s , t } 1 ≤ s , t ≤ n ) space: O ( n 2 ) (storage of the cost matrix) 18/47 Kernels to detect abrupt changes in time series Alain Celisse

  21. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Computational complexity (Improvement) Ideas: (with G. Rigaill and G. Marot) Never store the cost matrix Update each column C · , t +1 from C · , t Pseudo-code: 1: for t = 1 to n − 1 do Compute the ( t + 1)-th column C · , t +1 from C · , t 2: for D = 2 to min( t , D max ) do 3: L D , t +1 = min s ≤ t { L D − 1 , s + C s , t +1 } 4: end for 5: 6: end for Computational complexity Space: O ( D max n ) (only store C · , t ∈ R n ) Time: O ( D max n 2 ) (update rule+DP complexity) 19/47 Kernels to detect abrupt changes in time series Alain Celisse

  22. Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Runtime Open questions: Reduce computation time by low-rank matrix approx. Quantify what has been lost by the approx. 20/47 Kernels to detect abrupt changes in time series Alain Celisse

Recommend


More recommend