anomaly detection
play

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative Anomaly Detection Motivation Developing an anomaly detection system Anomaly detection vs. supervised learning Choosing what features


  1. Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

  2. Administrative

  3. Anomaly Detection • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

  4. Anomaly Detection • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

  5. Anomaly detection example • Dataset 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) • New engine: 𝑦 text 𝑦 2 vibration • Aircraft engine features: • 𝑦 1 = heat generated • 𝑦 2 = vibration intensity 𝑦 1 heat

  6. Density estimation • Dataset 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) • Is 𝑦 text anomalous? 𝑦 2 Model 𝒒 𝒚 vibration • 𝑞 𝑦 text < 𝜗 → flag anomaly • 𝑞 𝑦 text ≥ 𝜗 → OK 𝑦 1 heat

  7. Anomaly detection example • Fraud detection • 𝑦 (𝑗) = features of user I’s activities • Model 𝑞 𝑦 from data • Identify unusual users by checking which have 𝑞 𝑦 < 𝜗 • Manufacturing • Monitoring computers in a data center • 𝑦 (𝑗) = features of machine i • 𝑦 1 = memory use, 𝑦 2 = number of disk accesses/sec • 𝑦 3 = CPU load, 𝑦 4 = CPU load/network traffic

  8. Anomaly Detection • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

  9. Gaussian (normal) distribution • Say 𝑦 ∈ 𝑆 . If 𝑦 is a distributed Gaussian with mean 𝜈 , variance 𝜏 2 . • 𝑦 ∼ 𝑂(𝜈, 𝜏 2 ) 𝜏 standard deviation 𝑦−𝜈 2 • 𝑞 𝑦; 𝜈, 𝜏 2 = 1 2𝜌𝜏 exp − 2𝜏 2

  10. Gaussian distribution examples

  11. Parameter estimation • Dataset 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) • 𝑦 ∼ 𝑂(𝜈, 𝜏 2 ) • Maximum likelihood estimation 1 𝑛 𝑦 (𝑗) • ො 𝑛 σ 𝑗=1 𝜈 = 𝑛 (𝑦 𝑗 − ො 1 𝜏 2 = • ෢ 𝜈) 2 𝑛 σ 𝑗=1

  12. Density estimation • Dataset 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) • Each example 𝑦 ∈ 𝑆 𝑜 2 ⋯ 𝑞 𝑦 𝑜 ; 𝜈 𝑜 , 𝜏 𝑜 2 ) 𝑞 𝑦 2 ; 𝜈 2 , 𝜏 2 2 • 𝑞 𝑦 = 𝑞(𝑦 1 ; 𝜈 1 , 𝜏 1 2 ) = Π 𝑘 𝑞(𝑦 𝑘 ; 𝜈 𝑘 , 𝜏 𝑘

  13. Anomaly detection algorithm 1. Choose features 𝑦 𝑗 that you think might be indicative of anomalous examples 2 , 𝜏 2 2 , ⋯ , 𝜏 𝑜 2 2. Fit parameters 𝜈 1 , 𝜈 2, ⋯ , 𝜈 𝑜 , 𝜏 1 1 𝑛 𝑦 𝑘 (𝑗) • 𝜈 𝑘 = 𝑛 σ 𝑗=1 𝑗 − 𝜈 𝑘 ) 2 2 = 𝑛 (𝑦 𝑘 1 • 𝜏 𝑛 σ 𝑗=1 𝑘 3. Given new example 𝑦 , compute 𝑞 𝑦 2 ) 𝑞 𝑦 = Π 𝑘 𝑞(𝑦 𝑘 ; 𝜈 𝑘 , 𝜏 𝑘 Anomaly if 𝑞 𝑦 < 𝜗

  14. Evaluation • Assume we have some labeled data, of anomalous and non- anomalous examples ( 𝑧 = 0 if normal, 𝑧 = 1 if anomalous) • Training set 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) (assume normal examples) (1) , 𝑧 𝑑𝑤 (1) ), (𝑦 𝑑𝑤 (2) , 𝑧 𝑑𝑤 (2) ), ⋯ , (𝑦 𝑑𝑤 (𝑛 𝑑𝑤 ) , 𝑧 𝑑𝑤 (𝑛 𝑑𝑤 ) ) • Cross-validation set: (𝑦 𝑑𝑤 (1) , 𝑧 𝑢𝑓𝑡𝑢 (1) ), (𝑦 𝑢𝑓𝑡𝑢 (2) , 𝑧 𝑢𝑓𝑡𝑢 (2) ), ⋯ , (𝑦 𝑢𝑓𝑡𝑢 (𝑛 𝑢𝑓𝑡𝑢 ) , 𝑧 𝑢𝑓𝑡𝑢 (𝑛 𝑢𝑓𝑡𝑢 ) ) • Test set: (𝑦 𝑢𝑓𝑡𝑢

  15. Aircraft engines motivating example • 10000 good (normal) engines • 20 flawed engines (anomalous) • Training set: 6000 good engines • CV: 2000 good engines (y = 0), 10 anomalous (y = 1) • Test: 2000 good engines (y = 0), 10 anomalous (y = 1)

  16. Algorithm evaluation • Fit model 𝑞(𝑦) on training set {𝑦 1 , ⋯ , 𝑦 𝑛 } • On a cross-validation/test example 𝑦 , predict • 𝑧 = ቊ1 if 𝑞 𝑦 < 𝜗 (anomaly) 0 if 𝑞 𝑦 ≥ 𝜗 (normal) • Possible evaluation metrics: • True positive, false positive, false negative, true negative • Precision/Recall • F1-score • Can use cross-validation set to choose parameter 𝜗

  17. Evaluation metric • How about accuracy? • Assume only 0.1% of the engines are anomaly (skewed classes) • Declare every example as normal -> 99.9% accuracy!

  18. Precision/Recall 𝑄𝑆 • F1 score: 2 𝑄+𝑆

  19. Anomaly Detection • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

  20. Anomaly detection Supervised learning • Very small number of positive Large number of positive examples (y=1) (0-20 is common) and negative examples • Large number of negative (y=0) examples Enough positive • Many different types of anomalies. examples for algorithm Hard for any algorithm to learn from to get a sense of what positive examples what the positive are like, future anomalies look like positive examples likely to be similar to ones in • Future anomalies may look nothing training set. like any of the anomalous examples we have seen so far

  21. Anomaly detection Supervised learning • Fraud detection • Email spam classification • Manufacturing • Weather prediction • Monitoring machines in a data • Cancer classification center

  22. Anomaly Detection • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

  23. Non-Gaussian features log 𝑦

  24. Error analysis for anomaly detection Want 𝑞(𝑦) large for normal examples 𝑦 𝑞(𝑦) small for anomalous examples 𝑦 Most common problem: 𝑞(𝑦) is comparable (say both large) for normal and anomalous examples

  25. Monitoring computers in a data center • Choose features that might take on unusually large or small values in the event of an anomaly • 𝑦 1 = memory use of computer • 𝑦 2 = number of dis accesses/sec • 𝑦 3 = CPU load • 𝑦 4 = network traffic CPU load 𝑦 5 = CPU load^2 • 𝑦 5 = network traffic network traffic

  26. Anomaly Detection • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

  27. Motivating example: Monitoring machines in a data center 𝑦 2 (Memory use) 𝑦 1 (CPU load) 𝑦 2 (Memory use) 𝑦 1 (CPU load)

  28. Multivariate Gaussian (normal) distribution • 𝑦 ∈ 𝑆 𝑜 . Don’t model 𝑞 𝑦 1 , 𝑞 𝑦 2 , ⋯ separately • Model 𝑞 𝑦 all in one go. • Parameters : 𝜈 ∈ 𝑆 𝑜 , Σ ∈ 𝑆 𝑜×𝑜 (covariance matrix) 1 2𝜌 𝑜/2 Σ 1/2 exp − 𝑦 − 𝜈 ⊤ Σ −1 (𝑦 − 𝜈) • 𝑞 𝑦; 𝜈, Σ =

  29. Multivariate Gaussian (normal) examples Σ = 1 0 Σ = 0.6 0 Σ = 2 0 0 1 0 0.6 0 2 𝑦 2 𝑦 2 𝑦 2 𝑦 1 𝑦 1 𝑦 1

  30. Multivariate Gaussian (normal) examples Σ = 1 0 Σ = 0.6 0 Σ = 2 0 0 1 0 1 0 1 𝑦 2 𝑦 2 𝑦 2 𝑦 1 𝑦 1 𝑦 1

  31. Multivariate Gaussian (normal) examples Σ = 1 0 1 0.8 1 0.5 Σ = Σ = 0 1 0.8 1 0.5 1 𝑦 2 𝑦 2 𝑦 2 𝑦 1 𝑦 1 𝑦 1

  32. Anomaly detection using the multivariate Gaussian distribution 1. Fit model 𝑞 𝑦 by setting 𝑛 𝜈 = 1 𝑦 (𝑗) 𝑛 ෍ 𝑗=1 𝑛 Σ = 1 (𝑦 (𝑗) −𝜈)(𝑦 (𝑗) − 𝜈) ⊤ 𝑛 ෍ 𝑗=1 2 Give a new example 𝑦 , compute 1 2𝜌 𝑜/2 Σ 1/2 exp − 𝑦 − 𝜈 ⊤ Σ −1 (𝑦 − 𝜈) 𝑞 𝑦; 𝜈, Σ = Flag an anomaly if 𝑞 𝑦 < 𝜗

  33. Original model Original model 2 𝑞 𝑦 2 ; 𝜈 2 , 𝜏 2 2 ⋯ 𝑞 𝑦 𝑜 ; 𝜈 𝑜 , 𝜏 𝑜 2 𝑞 𝑦; 𝜈, Σ 𝑞 𝑦 1 ; 𝜈 1 , 𝜏 1 1 2𝜌 𝑜/2 Σ 1/2 exp(− 𝑦 − 𝜈 ⊤ Σ −1 (𝑦 = Manually create features to capture anomalies where 𝑦 1 , 𝑦 2 take unusual combinations of values Computationally cheaper (alternatively, scales better) OK even if training set size is small

  34. Things to remember • Motivation • Developing an anomaly detection system • Anomaly detection vs. supervised learning • Choosing what features to use • Multivariate Gaussian distribution

Recommend


More recommend