Changepoint detection in network measurements Allen B. Downey 1

Fundamental problem: � Predict next value in time series. Applications: � Protocol parameters (timeouts). � Resource selection, scheduling. � User feedback. 2

Two kinds of prediction: � Single value prediction. � Predictive distribution. • Summary stats. • Intervals. • P ( error > thresh ) • E [ cost ( error )] 3

If we assume stationarity, life is good. � Accumulate data indefinitely. � Predictive distribution = observed distribution. 4

Non-stationary models: � Trends + noise. � Level + changepoint + noise. 5

Network performance: � Some trends (accumulating queue). � Many abrupt changepoints. • Beginning and end of transfers. • Routing changes. • Hardware failure, replacement. 6

Prediction with known changepoints: � Use data back to the latest changepoint. � Less accurate immediately after. 7

Prediction with probablistic changepoints. P ( i ) = prob of a changepoint after point i Example: � 150 data points. � P (50) = 0 . 7 � P (100) = 0 . 5 How do we generate a predictive distribution? 8

Two steps: � Derive P ( i +) = prob that i is the last changepoint � Compute weighted mix going back to each i . Example: � P (50) = 0 . 7 P (100) = 0 . 5 � P (50+) = 0 . 35 P (100+) = 0 . 5 � Plus 0.15 chance of no changepoint. 9

Predictive distribution = 0 . 50 · ed f (100 , 150) ⊕ 0 . 35 · ed f (50 , 150) ⊕ 0 . 15 · ed f (0 , 150) 10

So how do we generate the probabilities P ( i +) ? Three steps: � Bayes’ theorem. � Simple case: we know there is 1 changepoint. � General case: unknown # of changepoints. 11

Bayes’ theorem (diachronic interpretation) P ( H | E ) = P ( E | H ) P ( E ) P ( H ) � H is a hypothesis, E is a body of evidence. � P ( H | E ) : posterior � P ( H ) : prior � P ( E | H ) is usually easy to compute. � P ( E ) is often not. 12

Unless we have a suite of exclusive hypotheses. � P ( E ) = P ( E | H i ) P ( H i ) H i ∈ S In that case life is good. 13

� If we know there there is exactly one changepoint in an interval... � ...then the P ( i ) are exclusive hypotheses, � and all we need is P ( E | i ) . Which is pretty much a solved problem. 14

What if the # of changepoints is unknown? � P ( i ) are no longer exclusive. � But the P ( i +) are. � And we can write a system of equations for P ( i +) . 15

� P ( i + ) = P ( i + |⊘ ) P ( ⊘ ) + P ( i + | j ++ ) P ( j ++ ) j<i � P ( j ++ ) is the prob that the second-to last changepoint is at i . � P ( i + | j ++ ) reduces to the simple problem. � P ( ⊘ ) is the prob that we have not seen two changepoints. � P ( i + |⊘ ) reduces to the simple problem (plus). Great, so what’s P ( j ++ ) ? 16

� P ( i ++ ) = P ( i ++ | k + ) P ( k + ) k>i � P ( i ++ | k + ) is just P ( i + ) computed at time k . � So we can solve for P ( i + ) in terms of P ( i ++ ) . � And P ( i ++ ) in terms of P ( i + ) . � Calling Dr. Jacobi! 17

Implementation: � Need to keep n 2 / 2 previous values. � And n 2 / 2 summary statistics. � And it takes n 2 work to do an update. � But, we only have to go back two changepoints, � ...so we can keep n small. 18

4 � Synthetic series data with two 2 changepoints. x[i] 0 � µ = − 0 . 5 , 0 . 5 , 0 . 0 -2 � σ = 1 . 0 -4 � P ( ⊘ ) = 0 . 04 1.0 P(i+) cumulative probability P(i++) 0.5 0.0 0 50 100 150 time 19

150 � The ubiquitous data annual flow (10^9 m^3) Nile dataset. 100 � Change in 1898. 50 � Estimated probs can be 0 1880 1900 1920 1940 1960 mercurial. 1.0 P33(i+) cumulative probability P66(i+) P99(i+) 0.5 0.0 1880 1900 1920 1940 1960 time 20

4 � Can also detect data change in 2 variance. 0 � µ = 1 , 0 , 0 -2 � σ = 1 , 1 , 0 . 5 -4 � Estimated P ( i + ) 1.0 cumulative probability P(i+) is good. P(i++) � Estimated 0.5 P ( i ++ ) less certain. 0.0 0 50 100 index 21

� Qualitative behavior seems good. � Quantitative tests: • Compare to GLR for online alarm problem. • Test predictive distribution with synthetic data, • ... and with real data. 22

Online alarm problem: � Observe process in real time. � µ 0 and σ known. � τ and µ 1 unknown. � Raise alarm when f ( data ) > thresh . � Minimize delay. � Minimize false alarm rate. 23

GLR = generalized likelihood ratio. � Compute decision function g k . � E [ g k ] = 0 before the changepoint, � ... increases after. � Alarm when g k > h . � GLR is optimal when µ 1 is known. 24

CPP = change point probability k � P ( i + ) P ( changepoint ) = i =0 � Alarm when P ( changepoint ) > thresh . 25

� µ = 0 , 1 15 � σ = 1 GLR � τ ∼ Exp (0 . 01) CPP mean delay 10 � Goodness = lower mean delay for same false alarm rate. 5 0 0.0 0.1 0.2 false alarm probability 26

� Fix false alarm 25 rate = 5% GLR (5% false alarm rate) � Vary σ . 20 CPP (5% false alarm rate) mean delay � CPP does well 15 with small S/N . 10 5 0 0.0 0.5 1.0 1.5 sigma 27

So it works on a simple problem. Future work: � Other changepoint problems (location, tracking, prediction). � Other data distributions (lognormal). � Testing robustness (real data). 28

Good news: � Very general framework. � Seems to work. � Many possible applications. 29

Bad news: � Still some wrinkles to iron. � n 2 space and time may be fatal. � May be overkill for original application. 30

� More at http://allendowney.com/changepoint � Or email downey@allendowney.com 31

Changepoint detection in network measurements Allen B. Downey 1 - PowerPoint PPT Presentation

Changepoint detection in network measurements Allen B. Downey 1 Fundamental problem: Predict next value in time series. Applications: Protocol parameters (timeouts). Resource selection, scheduling. User feedback. 2 Two kinds of

High-dimensional, multiscale online changepoint detection Richard J. Samworth University of

Changepoint detection for time series prediction Allen B. Downey Olin College of Engineering 1

Bayesian Minimal Description Lengths for Multiple Changepoint Detection Yingbo Li Dept of

Multiple Changepoint Detection in Climate Time Series Robert Lund Clemson Math Sciences

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

The Serverless PHP Application Rob Allen LaravelConf Taiwan 2020 Serverless? Rob Allen ~

Using the Mixture Kalman Filter to Track a Hidden State in Changepoint Models Sarah Oscroft

Microsticky Microsticky Measurements by Measurements by Measurements by Microsticky

Distributed Measurements for Attack Detection Distributed Measurements for Attack Detection Prof.

NEW ZEALAND NEW ZEALAND NEW ZEALAND NEW ZEALAND AND THE WOODY ALLEN AND THE WOODY ALLEN AND

Measurements of BB Angular Correlations Measurements of BB Angular Correlations Measurements of

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

NOVEMBER 1, 2011 EAST ALLEN COUNTY SCHOOLS MAGNET ACADEMY H. Steve Sprunger, Ph.D. East Allen

JavaScript: Skeletons in the Closet Allen Wirfs-Brock @awbjs www.wirfs-brock.com/allen

Diane B ne B. Allen len Equal Pay Act Diane B. Allen Equal Pay Act On April 24, 2018, Governor

Court Performance Measurements Hon. Charles Pratt, Judge Allen Superior Court Kathleen Rusher

Metric properties of large graphs Propri et es m etriques des grands graphes PhD

Spectra of magnetic chain graphs Pavel Exner Doppler Institute for Mathematical Physics and

15-388/688 - Practical Data Science: Graph and network processing J. Zico Kolter Carnegie Mellon

Graph-based Approaches for Analysing Team Interaction on the Example of Soccer Markus Brandt and

Consistent Change-point Detection with Kernels Damien Garreau 1 Sylvain Arlot 2 1 Inria, DI ENS 2

Solutions of Equations in One Variable Fixed-Point Iteration II Numerical Analysis (9th Edition)

Assessment and Diagnosis of Psychoactive Substance Use Disorders Winter 2005 Glenn Maynard, LPC

Dual Priority Scheduling is Not Optimal Pontus Ekberg Uppsala University ECRTS 2019 Stuttgart,