Anomaly Detection with State Space Models Multi-dimensional State - - PowerPoint PPT Presentation

anomaly detection with
SMART_READER_LITE
LIVE PREVIEW

Anomaly Detection with State Space Models Multi-dimensional State - - PowerPoint PPT Presentation

Anomaly Detection CAMCOS 2009 Introduction ADAPT Anomaly Detection with State Space Models Multi-dimensional State Space Models SVD Method EM Algorithm Kalman Filter Maja Derek, Kate Isaacs, Duncan McElfresh, Jennifer Alarm Murguia,


slide-1
SLIDE 1

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Anomaly Detection with Multi-dimensional State Space Models

Maja Derek, Kate Isaacs, Duncan McElfresh, Jennifer Murguia, Vinh Nguyen, David Shao, Caleb Wright, David Zimmermann

San José State University

December 9, 2009

slide-2
SLIDE 2

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Anomaly Detection

◮ We wish to automatically detect anomalies in

aeronautical systems.

◮ Anomalies may be broken equipment, failed sensors,

  • r operator mistakes.

◮ Detection is the first step towards diagnosis and

repair.

slide-3
SLIDE 3

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Difficulties in Anomaly Detection

◮ These systems are complicated.

◮ Cannot be reasonably ’solved.’ ◮ Many configurations of the system, both good and

bad

slide-4
SLIDE 4

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Problems with Current Detection Systems

◮ Rely on subjective parameters from a human expert ◮ Require examples of previous faults ◮ Are slow to realize an error ◮ Go too far in reducing the problem

slide-5
SLIDE 5

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

ADAPT

Advanced Diagnostics and Prognostics Testbed

◮ Set of testbeds designed by NASA for development,

benchmarking, and competition.

◮ ADAPT Electrical Power System is analogous to

electrical systems in air and spacecraft.

◮ We have nominal (healthy) and faulty (sick)

time-dependent data from an ADAPT power system.

slide-6
SLIDE 6

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The Goal

Develop a method for building a detector that is:

◮ Accurate - doesn’t miss anomalies

(false negatives) while not sounding false alarms (false positives).

◮ Responsive - detects anomalies

soon after they occur

◮ Self-contained - should not require

experience from live experts or examples of previous faults

slide-7
SLIDE 7

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The Solution

Detect Anomalies Build State Space Models

SVD Method EM Algorithm

ADAPT Data Build Alarm

slide-8
SLIDE 8

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The Solution

ADAPT Data

EM Algorithm SVD Method

Build State Space Models Build Alarm Detect Anomalies

slide-9
SLIDE 9

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The System

Power Supply Controls Load Bank

slide-10
SLIDE 10

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The Discrete “Inputs”

◮ Switches ◮ Circuit breakers

t

ut x

Discrete inputs directly affect the internal state of the system.

slide-11
SLIDE 11

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The Continuous “Outputs”

◮ Voltage ◮ Current ◮ Temperature ◮ Phase angle ◮ Speed/flow

t

yt xt u

Continuous outputs are affected by the internal state

  • f the system, as well as by

the inputs.

slide-12
SLIDE 12

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The Data

Data collected from experiments

◮ Uniform time length ◮ Different switches

flipped at different times

◮ 79 nominal data sets ◮ 154 faulty data sets

slide-13
SLIDE 13

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Nominal Data

◮ 79 data sets

collected with no errors

◮ We used these to

figure out how the system acts normally

slide-14
SLIDE 14

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Nominal Data

◮ 154 data sets

collected with errors injected

◮ We used these to

test our alarm detector Can you detect both faults?

slide-15
SLIDE 15

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Our System

slide-16
SLIDE 16

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Our System

slide-17
SLIDE 17

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The Solution

ADAPT Data

EM Algorithm SVD Method

Build State Space Models Build Alarm Detect Anomalies

slide-18
SLIDE 18

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The ADAPT System

◮ Triangles around inputs ◮ Circles around outputs

slide-19
SLIDE 19

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The State Space Model

y1 x x x

1

u2

2

u1

3

u3 y2 y3

◮ ut (triangles) are inputs; yt (circles) are outputs ◮ xt (blue squares) are called state space vectors ◮ Red arrows (which indicate interaction between

ut,yt, and xt) are parameters

slide-20
SLIDE 20

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

What We Know

y1 u2 u1 u3 y2 y3

◮ We do not know our xt

whitespace whitespace whitespace

slide-21
SLIDE 21

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

What We Know

y1 u2 u1 u3 y2 y3

◮ We do not know our xt ◮ We do not know our parameters

whitespace whitespace

slide-22
SLIDE 22

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The State Space Equations

xt = Axt−1 + But + wt yt = Cxt + Dut + vt

whitespace

◮ Vectors ut are inputs ◮ Vectors yt are outputs ◮ Vectors xt are state space vectors ◮ Matrices A, B, C, D and vectors wt, vt are parameters

slide-23
SLIDE 23

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Problem Outline

◮ What is our state space dimension, dim(xt)? (SVD

method)

◮ How do we find the parameters? (EM algorithm) ◮ How do we find our state space vectors xt? (Kalman

Filter)

◮ How does this model detect an anomaly?

slide-24
SLIDE 24

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The Solution

ADAPT Data

EM Algorithm SVD Method

Build State Space Models Build Alarm Detect Anomalies

slide-25
SLIDE 25

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

State Space Dimension Estimation

◮ Problem: What is the dimension of the hidden state

space vector xt?

◮ To find dim xt, we use the singular value

decomposition (SVD) method.

slide-26
SLIDE 26

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

SVD Method

◮ Formulate the Hankel matrix

◮ The Hankel matrix describes the autocorrelations of

the input vectors ut and the output vectors yt.

◮ Compute singular values of the Hankel matrix

◮ Singular values are non-negative numbers.

◮ In case of no noise, the number of nonzero singular

values equals the state space dimension.

slide-27
SLIDE 27

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Reasons to Use SVD Method

We decide to use the SVD method because:

◮ It does not rely on parameters A, B, C, D. ◮ It is computationally fast.

The SVD method is based on a theorem due to Kronecker’s contributions.

slide-28
SLIDE 28

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Theorem

In the absence of error, the rank of the Hankel matrix is equal to the state space dimension.

Kronecker 1823-1891

◮ Rank of Hankel matrix = number of non-zero singular

values.

◮ State space dimension = dim(xt).

slide-29
SLIDE 29

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Simulation

◮ We validate our SVD method with simulated data. ◮ Simulated data has dim(xt) = 5. ◮ We expect our result to have the same state space

dimension.

slide-30
SLIDE 30

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Simulation Result

slide-31
SLIDE 31

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Real ADAPT

◮ Real ADAPT data has noise, so it is difficult to

determine the precise state space dimension.

◮ dim(xt) can be any positive integer; the optimal

dimension is unknown

◮ Too few versus too many dimensions

◮ Choosing dimension too small–ignores available

information

◮ Choosing dimension too large–unnecessarily

complicates system

slide-32
SLIDE 32

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Real ADAPT Result

slide-33
SLIDE 33

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The Solution

ADAPT Data

EM Algorithm SVD Method

Build State Space Models Build Alarm Detect Anomalies

slide-34
SLIDE 34

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Expectation Maximization Algorithm

◮ Model:

xt = Axt−1 + But + wt yt = Cxt + Dut + vt

◮ The EM algorithm has two steps

◮ Expectation:

Make a good guess for what the hidden states are.

◮ Maximization:

Make a good guess for what the parameters are.

◮ Goals:

  • 1. To come up with a good estimate of the parameters.
  • 2. To use those parameters to estimate the hidden

states .

slide-35
SLIDE 35

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

EM Algorithm Variables

◮ Known Quantities (u, y)

player statistics, game results

◮ Hidden States (x)

how the game is actually going

◮ Parameters how the players’ abilities interact

slide-36
SLIDE 36

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Running the Algorithm

Problem:

◮ Without knowing what the hidden states are, we

cannot estimate the parameters.

◮ Without knowing what the parameters are, we cannot

estimate the hidden states. Solution:

◮ Hidden States

Kalman Filter

◮ Parameters

Maximum Likelihood Estimation

slide-37
SLIDE 37

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Maximum Likelihood Estimation

xt = Axt−1 + But + wt yt = Cxt + Dut + vt

◮ Given that we have some observations (the y’s),

what are the parameters that would make those y’s most likely to have occurred?

◮ Under reasonable assumptions, we can construct a

single function of the parameters that includes all of the data.

◮ We call this function L the likelihood function, and it

is essentially a measure of how well the model fits.

slide-38
SLIDE 38

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The Likelihood Function

L = f(x0)

T

  • t=1

f (xt|ut, xt−1)

T

  • t=1

f (yt|ut, xt)

◮ We claim that maximizing this function will give us a

set of parameters that would make our data “most likely” to have occurred.

◮ L is a function of 4534 unknown variables (not

counting the hidden states).

◮ There are two ways to maximize L:

◮ Gradient ascent ◮ Solve it analytically

slide-39
SLIDE 39

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Maximum Likelihood Estimation

◮ Some functions are easy to maximize: ◮ Some are a little trickier:

slide-40
SLIDE 40

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Iterate

◮ Use a guess for the parameters, along with a guess

for the first hidden state (x0), to estimate all of the x’s using the Kalman Filter).

◮ Use that data to improve our estimation of the

parameters, and repeat. One we are satisfied that we have estimated the parameters as well as we can, we can ask the important question:

◮ Given that we have a reasonable idea of what to

expect, what kind of data would be unusual?

slide-41
SLIDE 41

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The Solution

ADAPT Data

EM Algorithm SVD Method

Build State Space Models Build Alarm Detect Anomalies

slide-42
SLIDE 42

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Need to Filter Noise

Practical Problem: Getting Apollo missions safely to the moon and back.

◮ Abstract Problem: For each time t, given past

  • bservations yt−1, . . . , y1 make prediction

yt−1

t

  • f

present yt and of the variance (average uncertainty) between predicted and actual observation.

◮ At each time some noise with known variance

corrupts both observation and hidden state

◮ Goal: filter, compensate, for accumulated noise ◮ Rudolph Kalman presented solution, Kalman filter

(1960), extended by NASA Ames for Apollo

slide-43
SLIDE 43

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Filtering Visually a Graph for Increasing Time

2 4 6 8 10 12 14 16 18

time

  • bserved

expected

◮ Predict expected values left-to-right, time t prediction

from past t − 1, t − 2, . . . , 2, 1.

◮ Be skeptical of extreme values, because values at

previous times do not support—value time 10

◮ Increase skepticism as noise accumulates with time. ◮ Draw expected value curve “in middle” of observed.

slide-44
SLIDE 44

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Hidden State Estimated from Observations

Image: Moment of decision at Mission Control Center for whether Apollo 16 should land on the Moon

◮ Hidden state xt, such as position, determines all. ◮ Each time t, predict hidden state

xt−1

t

from past t − 1, t − 2, . . . , 2, 1, then using model predict

  • bservation

yt−1

t

at next time t

◮ Prediction error yt −

yt−1

t

  • f observation compared

to its variance used to correct prediction of hidden state xt

t, now hidden state given observations

including that at current time t and of past.

slide-45
SLIDE 45

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Kalman Filter Can Estimate Uncertainty

2 4 6 8 10 12 14 16 18 20 6 7 8 9 10 11 12 13 14 15

time y Kalman Filter Applied to 1D Output

  • bserved

filtered upper bound lower bound

◮ Data generated from “nice” model stays within

uncertainty bounds in green after Kalman filtering

◮ Bounds obtained through Mahalanobis distance c2 t ,

error scaled by predicted variance Σt at time t c2

t = (yt − ˆ

yt−1

t

)TΣ−1

t

(yt − ˆ yt−1

t

)

slide-46
SLIDE 46

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Using Many Data Sets for Detection

◮ Are given many sets of data without anomaly ◮ Using EM algorithm each data set has its own state

space model

◮ Given another data set, for each previous model, can

use Kalman filter to make predictions both of values and of Mahalanobis distances c2

t ◮ Since assumed same physical system, ADAPT

testbed, do data sets give models that say something about different data from same system? Yes, so we can compute in advance.

◮ Using information from many data sets:

◮ Use c2

t as statistics

◮ Given these c2

t statistics varying in time, what is

anomalous?

slide-47
SLIDE 47

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The Solution

ADAPT Data

EM Algorithm SVD Method

Build State Space Models Build Alarm Detect Anomalies

slide-48
SLIDE 48

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Building the Alarm

◮ Methods: SVD and EM algorithm gave us the model

for ADAPT system. xt = Axt−1 + But + wt yt = Cxt + Dut + vt

◮ Model enables us to generate expected

  • bservations, ˆ

y’s.

◮ Expected ˆ

y’s form an ellipsoid = (mean, spread of expected ˆ y’s).

◮ Compare real-time readings yt to the expected

  • bservations ˆ

yt within the ellipsoid.

slide-49
SLIDE 49

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Outputs / Sensor Readings

−2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −3 −2 −1 1 2 3

Observations

Goal:

◮ Single out outliers = find dots outside the ellipsoid

slide-50
SLIDE 50

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Numbers for Vectors

◮ Our sensors readings are vectors of a high

dimension; dim yt = 50

◮ Appropriate metric to determine the multivariate

  • utliers is a Mahalanobis distance

◮ To each observation vector at each time step we are

assigning a number, yt → c2

t ◮ c2 t is a Mahalanobis distance that measures how far

  • ur actual observation is from the expected one

−2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −3 −2 −1 1 2 3

Observations

slide-51
SLIDE 51

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

c2

t Curve

Can you locate the anomaly? Look for the JUMP!

◮ We are analyzing c2 t curves ◮ ∆ = rate of change for each c2 t curve

50 100 150 200 250 300 350 400 450 0.98 1 1.02 1.04 1.06 1.08 1.1 1.12 1.14 1.16

Time (half seconds) c2 c2 curve

anomaly

Δ

slide-52
SLIDE 52

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Anomaly Detection

◮ 74 nominal data sets = 74 "experts" ◮ Our "Alarm" relies on many of the 74 c2 t curves

50 100 150 200 250 300 350 400 450 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 10

4

Time (half seconds) c2 c2 curves

slide-53
SLIDE 53

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Thresholds on c2

t

  • 1. ∆ = Rate of change in each c2

t curve

∆ suddenly increases ⇒ "Alarm"

  • 2. # of Experts saying "Alarm"

Don’t trust just one "expert" screaming Alarm! Both rates will be used to adjust the sensitivity of our detector.

slide-54
SLIDE 54

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

The Solution

ADAPT Data

EM Algorithm SVD Method

Build State Space Models Build Alarm Detect Anomalies

slide-55
SLIDE 55

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Expectations for our Anomaly Detector

◮ We want:

◮ To detect faults accurately. ◮ A low False Positive Rate: a low number of false

alarms.

◮ A low False Negative Rate: a low number of missed

faults.

◮ To detect anomalies within a few seconds of the fault

  • ccuring.

◮ To have a fast computation time (real-time). ◮ To use as little memory as possible.

slide-56
SLIDE 56

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Receiver Operating Characteristic (ROC)

◮ Each point along the curve is a True Positive Rate

(TPR) and False Positive Rate (FPR) for a chosen threshold.

◮ A true positive is when our method detects a fault

when a fault has occured in the system.

◮ A false positive is when our method detects a fault

when no fault has occured in the system.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

ROC Expert % = 4 False Positive Rate True Positive Rate

FPR = 0.2661 TPR = 0.9675

slide-57
SLIDE 57

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Receiver Operating Characteristic (ROC)

◮ As we vary the chosen threshold we get a curve

similar to the one below.

◮ We want to chose a threshold in the upper-left corner

  • f the graph.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

ROC Expert % = 4 False Positive Rate True Positive Rate

FPR = 0.2661 TPR = 0.9675

slide-58
SLIDE 58

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Our Results

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

ROC Expert % = 4 False Positive Rate True Positive Rate

FPR = 0.2661 TPR = 0.9675

◮ This gives us a True Positive Rate of 0.9675. ◮ This gives us a False Positive Rate of 0.2661. ◮ On average, we are able to detect anomalies within

5.85 seconds from the time the actual fault occured.

slide-59
SLIDE 59

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Comparing to the DX Competition

◮ We follow the DX competition rules to the letter

◮ Using only 34 nominal training sets on 120

competition files.

◮ Files are counted as either false positives or false

negatives but not both.

False False Average Positive Negative Detection Team Rate Rate Time Linköping University 0.5417 0.0972 3.490 Canberra Research Lab 0.5106 0.0959 30.742 Integra Software 0.8143 0.2400 14.099 Carnegie Mellon / NASA 0.0732 0.1392 5.981 UCSC / Perot Systems 0.0000 0.3000 17.610 Stanford 0.3256 0.0519 3.946 CAMCOS 0.3000 0.2125 5.903

slide-60
SLIDE 60

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Future Work

◮ Instead of using the ellipsoid as our bound for yt, find

the closed form for the distribution of yt to generate a confidence interval.

◮ This interval should give us a better bound on the

values of yt and thus a better way of detecting

  • utliers.

◮ Any yt that is outside this confidence interval would

be considered an anomaly.

◮ This in turn would hopefully lead to us to achieve a

higher rate of detection accuracy.

slide-61
SLIDE 61

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Future Work

◮ The next issue we would like to tackle is isolating the

anomalies.

◮ Not only do we want to detect a fault accurately, but

we want to know where the fault is in the system.

◮ Once a fault is isolated then it is easier to find a

solution to the problem and figure out what was the cause of the fault.

slide-62
SLIDE 62

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

Thank You

Thank You!

slide-63
SLIDE 63

Anomaly Detection CAMCOS 2009 Introduction ADAPT State Space Models SVD Method EM Algorithm Kalman Filter Alarm Results Future Work

LUNCH

  • Library

King We Are Here

4th Street 10th Street San Fernando Street

Flames Student Union

*

slide-64
SLIDE 64

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Appendix Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

slide-65
SLIDE 65

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Hankel Matrix

Define the following block-Hankel matrix H by H ≡         Γ1 Γ2 Γ3 · · · ΓN Γ2 Γ3 Γ3 ... . . . ΓN Γ2N−1         where N is chosen to be sufficiently large, and each autocovariance matrix Γl is estimated by ˆ Γl =

T−l

  • t=l

ut+l yt+l ut yt ′ So H is has N(m + n) rows and N(m + n) columns.

slide-66
SLIDE 66

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Assumptions

To use maximum likelihood estimation, we need to make two important assumptions about the data, which we hope conform to some extent with reality:

◮ That each timestep is independent of the previous

  • timesteps. We assume that each timestep contains

all of the information from all previous timesteps.

◮ That we know how the data is distributed, even if we

don’t know the parameters of that distribution. If we make these assumptions, each piece of data has its

  • wn distribution (density function), and we can multiply

these together to get a new pdf, which we can then view as a function of the parameters, not of the data.

slide-67
SLIDE 67

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

The Good News

If we begin with a guess, we can improve that guess until (hopefully) our guess mutates into something like the truth.

slide-68
SLIDE 68

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Parameters and Initialization

Definition

Let parameters Θ be Θ = {E [x0] , V(x0), At, Bt, Ct, Dt, V(wt), V(vt)} ∞

t=1

Let F(Θ) stand for being a function of Θ, and F(Θ, Zs) stand for being a function of both Θ and Zs.

Definition

Let x0

0 = E [x0] and V

  • ǫx0
  • = V(x0), all F(Θ).
slide-69
SLIDE 69

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Forward Recursion

Theorem

If xt−1

t−1 is F(Θ, Zt−1) and V

  • ǫxt−1

t−1

  • is F(Θ), then

◮ covariance matrices are F(Θ)

V

  • ǫxt−1

t

  • ;

V

  • ǫyt−1

t

  • ;

V

  • ǫxt

t

xt−1

t

and yt−1

t

are F(Θ, Zt−1)

xt

t is F(Θ, Zt).

Theorem

Zt and Θ give real non-negative numbers det V

  • ǫyt−1

t

  • and
  • ǫyt−1

t

T V

  • ǫyt−1

t

−1 ǫyt−1

t

.

slide-70
SLIDE 70

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Intermediate Estimates

Theorem

  • xt−1

t

= At xt−1

t−1 + Btut

(1)

  • yt−1

t

= Ct xt−1

t

+ Dtut (2) ǫyt−1

t

= yt − yt−1

t

(3) ǫxt−1

t

= Atǫxt−1

t−1 + wt

ǫyt−1

t

= Ctǫxt−1

t

+ vt.

Proof.

wt ⊥ Zt−1; vt ⊥ Zt−1

slide-71
SLIDE 71

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Intermediate Covariances

Theorem

V

  • ǫxt−1

t

  • =

AtV

  • ǫxt−1

t−1

  • AT

t + V(wt)

(4) V

  • ǫyt−1

t

  • =

CtV

  • ǫxt−1

t

  • CT

t + V(vt)

(5) Σ

  • ǫxt−1

t

, ǫyt−1

t

  • =

V

  • ǫxt−1

t

  • CT

t

(6)

Proof.

Cross-covariances are 0 since wt ⊥ ǫxt−1

t−1

vt ⊥ ǫxt−1

t−1

vt ⊥ wt

slide-72
SLIDE 72

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Projection Theorem

To find the projection of xt on Zt, first project xt onto subspace Zt−1 ⊂ Zt. Then project remainder ǫxt−1

t

  • n

new knowledge ǫyt−1

t

∈ Zt since ǫyt−1

t

⊥ Zt−1

Theorem

  • xt

t

=

  • xt−1

t

+ K tǫyt−1

t

(7) where K t = Σ

  • ǫxt−1

t

, ǫyt−1

t

  • V
  • ǫyt−1

t

−1 (8) is called the Kalman gain, K t = F(Θ).

slide-73
SLIDE 73

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Orthogonal Complement Covariance

Theorem

V

  • ǫxt

t

  • =

V

  • ǫxt−1

t

  • − K tΣ
  • ǫxt−1

t

, ǫyt−1

t

T . (9)

Proof.

ǫxt

t

= ǫxt−1

t

− K tǫyt−1

t

slide-74
SLIDE 74

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Kalman Filter Computes Expected Observations

◮ Kalman filter estimates how means (averages) and

variances (spreads) evolve in time

slide-75
SLIDE 75

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Kalman Filter Computes Expected Observations

◮ Kalman filter estimates how means (averages) and

variances (spreads) evolve in time

◮ Recall state-space model: x hidden (unknown)

variables, u known inputs, y observed outputs, noise w and v xt = Axt−1 + But + wt yt = Cxt + Dut + vt

slide-76
SLIDE 76

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Kalman Filter Computes Expected Observations

◮ Kalman filter estimates how means (averages) and

variances (spreads) evolve in time

◮ Recall state-space model: x hidden (unknown)

variables, u known inputs, y observed outputs, noise w and v xt = Axt−1 + But + wt yt = Cxt + Dut + vt

◮ Suppose we can predict xt−1 by ˆ

xt−1

slide-77
SLIDE 77

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Kalman Filter Computes Expected Observations

◮ Kalman filter estimates how means (averages) and

variances (spreads) evolve in time

◮ Recall state-space model: x hidden (unknown)

variables, u known inputs, y observed outputs, noise w and v xt = Axt−1 + But + wt yt = Cxt + Dut + vt

◮ Suppose we can predict xt−1 by ˆ

xt−1

◮ Assuming noise wt and vt average out to zero, output

and hidden variables estimates evolve as ˆ xt = Aˆ xt−1 + But

slide-78
SLIDE 78

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Kalman Filter Computes Expected Observations

◮ Kalman filter estimates how means (averages) and

variances (spreads) evolve in time

◮ Recall state-space model: x hidden (unknown)

variables, u known inputs, y observed outputs, noise w and v xt = Axt−1 + But + wt yt = Cxt + Dut + vt

◮ Suppose we can predict xt−1 by ˆ

xt−1

◮ Assuming noise wt and vt average out to zero, output

and hidden variables estimates evolve as ˆ xt = Aˆ xt−1 + But

slide-79
SLIDE 79

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Kalman Filter Computes Expected Observations

◮ Kalman filter estimates how means (averages) and

variances (spreads) evolve in time

◮ Recall state-space model: x hidden (unknown)

variables, u known inputs, y observed outputs, noise w and v xt = Axt−1 + But + wt yt = Cxt + Dut + vt

◮ Suppose we can predict xt−1 by ˆ

xt−1

◮ Assuming noise wt and vt average out to zero, output

and hidden variables estimates evolve as ˆ xt = Aˆ xt−1 + But ˆ yt = Cˆ xt + Dut

◮ Have estimates of averages, can estimate errors ∆

  • f form y − ˆ

y = y − (Ax − Bu) or x − ˆ x = x − (Cx − Du).

slide-80
SLIDE 80

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Kalman Filter Estimates Hidden Variables

y1 x x x

1

u2

2

u1

3

u3 y2 y3

◮ Kalman filter estimates how means (averages) of

the hidden variables xt and their variances (spreads) evolve in time

slide-81
SLIDE 81

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

MLE, Kalman Filter, Linear Algebra

◮ Assume probability density is (multivariable version)

normal f(w) = 1 √ 2π 1 √ σ2 exp

  • −1

2 1 |σ2|∆2

  • where ∆ is an error and σ2 is the variance of noise
slide-82
SLIDE 82

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

MLE, Kalman Filter, Linear Algebra

◮ Assume probability density is (multivariable version)

normal f(w) = 1 √ 2π 1 √ σ2 exp

  • −1

2 1 |σ2|∆2

  • where ∆ is an error and σ2 is the variance of noise

◮ Observe ∆ is function of model parameters A, B...

slide-83
SLIDE 83

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

MLE, Kalman Filter, Linear Algebra

◮ Assume probability density is (multivariable version)

normal f(w) = 1 √ 2π 1 √ σ2 exp

  • −1

2 1 |σ2|∆2

  • where ∆ is an error and σ2 is the variance of noise

◮ Observe ∆ is function of model parameters A, B... ◮ Iterate until f somehow “converges”

slide-84
SLIDE 84

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

MLE, Kalman Filter, Linear Algebra

◮ Assume probability density is (multivariable version)

normal f(w) = 1 √ 2π 1 √ σ2 exp

  • −1

2 1 |σ2|∆2

  • where ∆ is an error and σ2 is the variance of noise

◮ Observe ∆ is function of model parameters A, B... ◮ Iterate until f somehow “converges”

  • 1. Fix model parameters A, B..., use Kalman filter to

estimate ∆s.

slide-85
SLIDE 85

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

MLE, Kalman Filter, Linear Algebra

◮ Assume probability density is (multivariable version)

normal f(w) = 1 √ 2π 1 √ σ2 exp

  • −1

2 1 |σ2|∆2

  • where ∆ is an error and σ2 is the variance of noise

◮ Observe ∆ is function of model parameters A, B... ◮ Iterate until f somehow “converges”

  • 1. Fix model parameters A, B..., use Kalman filter to

estimate ∆s.

  • 2. Fix ∆s, use MLE to find model parameters A, B...

that maximize probability density (now called likelihood function)

slide-86
SLIDE 86

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

MLE, Kalman Filter, Linear Algebra

◮ Assume probability density is (multivariable version)

normal f(w) = 1 √ 2π 1 √ σ2 exp

  • −1

2 1 |σ2|∆2

  • where ∆ is an error and σ2 is the variance of noise

◮ Observe ∆ is function of model parameters A, B... ◮ Iterate until f somehow “converges”

  • 1. Fix model parameters A, B..., use Kalman filter to

estimate ∆s.

  • 2. Fix ∆s, use MLE to find model parameters A, B...

that maximize probability density (now called likelihood function)

◮ Under all these assumptions, matrix algebra can

solve both Kalman filter estimates and MLE

slide-87
SLIDE 87

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Comparing Models

◮ Recall state-space model:

xt = Axt−1 + But + wt yt = Cxt + Dut + vt

slide-88
SLIDE 88

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Comparing Models

◮ Recall state-space model:

xt = Axt−1 + But + wt yt = Cxt + Dut + vt

◮ If coordinate change–permuting hidden

variables–same form model different parameters

slide-89
SLIDE 89

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Comparing Models

◮ Recall state-space model:

xt = Axt−1 + But + wt yt = Cxt + Dut + vt

◮ If coordinate change–permuting hidden

variables–same form model different parameters

◮ Want statistic valid with permutation

slide-90
SLIDE 90

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Comparing Models

◮ Recall state-space model:

xt = Axt−1 + But + wt yt = Cxt + Dut + vt

◮ If coordinate change–permuting hidden

variables–same form model different parameters

◮ Want statistic valid with permutation ◮ Kalman filter variance (spread) estimates Σt at time t

can scale output error ∆y to give such statistic St = ∆ytΣ−1∆yt

slide-91
SLIDE 91

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Comparing Models

◮ Recall state-space model:

xt = Axt−1 + But + wt yt = Cxt + Dut + vt

◮ If coordinate change–permuting hidden

variables–same form model different parameters

◮ Want statistic valid with permutation ◮ Kalman filter variance (spread) estimates Σt at time t

can scale output error ∆y to give such statistic St = ∆ytΣ−1∆yt

◮ Kalman filter gives theoretical probability distribution

Probt such that as time t varies, probability of

  • bserving this value of statistic or lower

Probt {z|z ≤ St} should be “evenly distributed” from 0 to 1.

slide-92
SLIDE 92

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Validating With Model Following Assumptions

◮ Generate data from

state-space system and normal, independent, noise

◮ Estimate parameters

using MLE and Kalman filter

◮ Run Kalman filter using

these parameters on same data and calculate statistics St

100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Multivariate normal state space system

time steps Cumulative Distribution for Statistic

MLE and Kalman filter tries to attain picture for certain statistic–evenly distributed

slide-93
SLIDE 93

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Real Data, Different Data Sets

◮ Estimate model from one data set, no faults,

calculate statistic for another data set, no faults, still problems–statistic just way too high

50 100 150 200 250 300 350 400 450 500 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

time steps Cumulative Distribution for Statistic Different ADAPT Files

slide-94
SLIDE 94

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Validate Model on Real Data, Same Data Set

◮ Use one ADAPT data set without anomaly ◮ Estimate parameters using MLE and Kalman filter,

use parameters and Kalman filter back on same data, calculate statistics {St}

◮ Somewhat similar results, except values close to 1

during initial system startup

50 100 150 200 250 300 350 400 450 500 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

time steps Cumulative Distribution for Statistic

Data Set Trained on Itself

slide-95
SLIDE 95

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

Mahalanobis distance

◮ Mahalanobis distance is based on correlations

between variables

◮ Mahalanobis distance of a n dim vector y from the

group of expected observations with mean µy with covariance matric Σ is defined as c2

t = (yt − µy)′Σ−1(yt − µy) ◮ for Σ = I the identity matrix, Mahalanobis distance =

Euclidean distance

slide-96
SLIDE 96

Anomaly Detection CAMCOS 2009 Appendix

Additional Material Hankel Matrix EM Algorithm: Additional Material Additional Material on Kalman Filter Kalman Filter Additional Alarm Additional Material

References

Image of “Apollo 16 Command and Service Module Over the Moon” http://grin.hq.nasa.gov/IMAGES/SMALL/GPN- 2002-000069.jpg The moment of decision at Mission Con- trol Center for whether Apollo 16 should land on the Moon, http://images.jsc.nasa.gov/search/search.cgi?searchpage=true&selections=AS16&bro