changepoint detection in network measurements allen b
play

Changepoint detection in network measurements Allen B. Downey 1 - PowerPoint PPT Presentation

Changepoint detection in network measurements Allen B. Downey 1 Fundamental problem: Predict next value in time series. Applications: Protocol parameters (timeouts). Resource selection, scheduling. User feedback. 2 Two kinds of


  1. Changepoint detection in network measurements Allen B. Downey 1

  2. Fundamental problem: � Predict next value in time series. Applications: � Protocol parameters (timeouts). � Resource selection, scheduling. � User feedback. 2

  3. Two kinds of prediction: � Single value prediction. � Predictive distribution. • Summary stats. • Intervals. • P ( error > thresh ) • E [ cost ( error )] 3

  4. If we assume stationarity, life is good. � Accumulate data indefinitely. � Predictive distribution = observed distribution. 4

  5. Non-stationary models: � Trends + noise. � Level + changepoint + noise. 5

  6. Network performance: � Some trends (accumulating queue). � Many abrupt changepoints. • Beginning and end of transfers. • Routing changes. • Hardware failure, replacement. 6

  7. Prediction with known changepoints: � Use data back to the latest changepoint. � Less accurate immediately after. 7

  8. Prediction with probablistic changepoints. P ( i ) = prob of a changepoint after point i Example: � 150 data points. � P (50) = 0 . 7 � P (100) = 0 . 5 How do we generate a predictive distribution? 8

  9. Two steps: � Derive P ( i +) = prob that i is the last changepoint � Compute weighted mix going back to each i . Example: � P (50) = 0 . 7 P (100) = 0 . 5 � P (50+) = 0 . 35 P (100+) = 0 . 5 � Plus 0.15 chance of no changepoint. 9

  10. Predictive distribution = 0 . 50 · ed f (100 , 150) ⊕ 0 . 35 · ed f (50 , 150) ⊕ 0 . 15 · ed f (0 , 150) 10

  11. So how do we generate the probabilities P ( i +) ? Three steps: � Bayes’ theorem. � Simple case: we know there is 1 changepoint. � General case: unknown # of changepoints. 11

  12. Bayes’ theorem (diachronic interpretation) P ( H | E ) = P ( E | H ) P ( E ) P ( H ) � H is a hypothesis, E is a body of evidence. � P ( H | E ) : posterior � P ( H ) : prior � P ( E | H ) is usually easy to compute. � P ( E ) is often not. 12

  13. Unless we have a suite of exclusive hypotheses. � P ( E ) = P ( E | H i ) P ( H i ) H i ∈ S In that case life is good. 13

  14. � If we know there there is exactly one changepoint in an interval... � ...then the P ( i ) are exclusive hypotheses, � and all we need is P ( E | i ) . Which is pretty much a solved problem. 14

  15. What if the # of changepoints is unknown? � P ( i ) are no longer exclusive. � But the P ( i +) are. � And we can write a system of equations for P ( i +) . 15

  16. � P ( i + ) = P ( i + |⊘ ) P ( ⊘ ) + P ( i + | j ++ ) P ( j ++ ) j<i � P ( j ++ ) is the prob that the second-to last changepoint is at i . � P ( i + | j ++ ) reduces to the simple problem. � P ( ⊘ ) is the prob that we have not seen two changepoints. � P ( i + |⊘ ) reduces to the simple problem (plus). Great, so what’s P ( j ++ ) ? 16

  17. � P ( i ++ ) = P ( i ++ | k + ) P ( k + ) k>i � P ( i ++ | k + ) is just P ( i + ) computed at time k . � So we can solve for P ( i + ) in terms of P ( i ++ ) . � And P ( i ++ ) in terms of P ( i + ) . � Calling Dr. Jacobi! 17

  18. Implementation: � Need to keep n 2 / 2 previous values. � And n 2 / 2 summary statistics. � And it takes n 2 work to do an update. � But, we only have to go back two changepoints, � ...so we can keep n small. 18

  19. 4 � Synthetic series data with two 2 changepoints. x[i] 0 � µ = − 0 . 5 , 0 . 5 , 0 . 0 -2 � σ = 1 . 0 -4 � P ( ⊘ ) = 0 . 04 1.0 P(i+) cumulative probability P(i++) 0.5 0.0 0 50 100 150 time 19

  20. 150 � The ubiquitous data annual flow (10^9 m^3) Nile dataset. 100 � Change in 1898. 50 � Estimated probs can be 0 1880 1900 1920 1940 1960 mercurial. 1.0 P33(i+) cumulative probability P66(i+) P99(i+) 0.5 0.0 1880 1900 1920 1940 1960 time 20

  21. 4 � Can also detect data change in 2 variance. 0 � µ = 1 , 0 , 0 -2 � σ = 1 , 1 , 0 . 5 -4 � Estimated P ( i + ) 1.0 cumulative probability P(i+) is good. P(i++) � Estimated 0.5 P ( i ++ ) less certain. 0.0 0 50 100 index 21

  22. � Qualitative behavior seems good. � Quantitative tests: • Compare to GLR for online alarm problem. • Test predictive distribution with synthetic data, • ... and with real data. 22

  23. Online alarm problem: � Observe process in real time. � µ 0 and σ known. � τ and µ 1 unknown. � Raise alarm when f ( data ) > thresh . � Minimize delay. � Minimize false alarm rate. 23

  24. GLR = generalized likelihood ratio. � Compute decision function g k . � E [ g k ] = 0 before the changepoint, � ... increases after. � Alarm when g k > h . � GLR is optimal when µ 1 is known. 24

  25. CPP = change point probability k � P ( i + ) P ( changepoint ) = i =0 � Alarm when P ( changepoint ) > thresh . 25

  26. � µ = 0 , 1 15 � σ = 1 GLR � τ ∼ Exp (0 . 01) CPP mean delay 10 � Goodness = lower mean delay for same false alarm rate. 5 0 0.0 0.1 0.2 false alarm probability 26

  27. � Fix false alarm 25 rate = 5% GLR (5% false alarm rate) � Vary σ . 20 CPP (5% false alarm rate) mean delay � CPP does well 15 with small S/N . 10 5 0 0.0 0.5 1.0 1.5 sigma 27

  28. So it works on a simple problem. Future work: � Other changepoint problems (location, tracking, prediction). � Other data distributions (lognormal). � Testing robustness (real data). 28

  29. Good news: � Very general framework. � Seems to work. � Many possible applications. 29

  30. Bad news: � Still some wrinkles to iron. � n 2 space and time may be fatal. � May be overkill for original application. 30

  31. � More at http://allendowney.com/changepoint � Or email downey@allendowney.com 31

Recommend


More recommend