haoyi fan 1 fengbin zhang 1 ruidong wang 1
play

Haoyi Fan 1 , Fengbin Zhang 1 , Ruidong Wang 1 , Liang Xi 1 , Zuoyong - PowerPoint PPT Presentation

-1- Correlation-aware Deep Generative Model for Unsupervised Anomaly Detection Haoyi Fan 1 , Fengbin Zhang 1 , Ruidong Wang 1 , Liang Xi 1 , Zuoyong Li 2 Harbin University of Science and Technology 1 Minjiang University 2 isfanhy@hrbust.edu.cn


  1. -1- Correlation-aware Deep Generative Model for Unsupervised Anomaly Detection Haoyi Fan 1 , Fengbin Zhang 1 , Ruidong Wang 1 , Liang Xi 1 , Zuoyong Li 2 Harbin University of Science and Technology 1 Minjiang University 2 isfanhy@hrbust.edu.cn

  2. -2- Background Anomaly Anomaly Observed Space Latent Space Normal

  3. -3- Background https://www.explosion.com/135494/5-effective-strategies-of-fraud- https://towardsdatascience.com/building-an-intrusion-detection-system- detection-and-prevention-for-ecommerce/ using-deep-learning-b9488332b321 Fraud Detection Intrusion Detection https://planforgermany.com/switching-private-public-health-insurance- https://blog.exporthub.com/working-with-chinese-manufacturers/ germany/ Disease Detection Fault Detection

  4. -4- Background Unsupervised Anomaly Detection โ€“ From the Density Estimation Perspective Data samples: ๐‘Œ ๐‘ข๐‘ ๐‘๐‘—๐‘œ = ๐‘ฆ 1 , ๐‘ฆ 2 , ๐‘ฆ 3 , โ€ฆ , ๐‘ฆ ๐‘œ , ๐‘ฆ ๐‘— is assumed normal. Latent Space

  5. -5- Background Unsupervised Anomaly Detection โ€“ From the Density Estimation Perspective Data samples: ๐‘Œ ๐‘ข๐‘ ๐‘๐‘—๐‘œ = ๐‘ฆ 1 , ๐‘ฆ 2 , ๐‘ฆ 3 , โ€ฆ , ๐‘ฆ ๐‘œ , ๐‘ฆ ๐‘— is assumed normal. Model: ๐‘ž(๐‘ฆ) Latent Space

  6. -6- Background Unsupervised Anomaly Detection โ€“ From the Density Estimation Perspective Data samples: ๐‘Œ ๐‘ข๐‘ ๐‘๐‘—๐‘œ = ๐‘ฆ 1 , ๐‘ฆ 2 , โ€ฆ , ๐‘ฆ ๐‘œ , ๐‘ฆ ๐‘— is assumed normal. Model: ๐‘ž(๐‘ฆ) Test samples: ๐‘Œ ๐‘ข๐‘“๐‘ก๐‘ข = ๐‘ฆ 1 , ๐‘ฆ 2 , โ€ฆ , ๐‘ฆ ๐‘œ , ๐‘ฆ ๐‘ข is unknow. if ๐‘ž(๐‘ฆ ๐‘ข ) < ๐œ‡ , ๐‘ฆ ๐‘ข is abnormal . if ๐‘ž(๐‘ฆ ๐‘ข ) โ‰ฅ ๐œ‡ , ๐‘ฆ ๐‘ข is normal . Latent Space

  7. -7- Background Unsupervised Anomaly Detection โ€“ From the Density Estimation Perspective Data samples: ๐‘Œ ๐‘ข๐‘ ๐‘๐‘—๐‘œ = ๐‘ฆ 1 , ๐‘ฆ 2 , โ€ฆ , ๐‘ฆ ๐‘œ , ๐‘ฆ ๐‘— is assumed normal. Model: ๐‘ž(๐‘ฆ) Test samples: ๐‘Œ ๐‘ข๐‘“๐‘ก๐‘ข = ๐‘ฆ 1 , ๐‘ฆ 2 , โ€ฆ , ๐‘ฆ ๐‘œ , ๐‘ฆ ๐‘ข is unknow. if ๐‘ž(๐‘ฆ ๐‘ข ) < ๐œ‡ , ๐‘ฆ ๐‘ข is abnormal . if ๐‘ž(๐‘ฆ ๐‘ข ) โ‰ฅ ๐œ‡ , ๐‘ฆ ๐‘ข is normal . Latent Space Anomalies reside in the low probability density areas.

  8. -8- Background Correlation among data samples Conventional Anomaly Feature Learning Detection Graph Modeling Feature Space Correlation-aware Anomaly Feature Learning Detection Structure Space How to discover the normal pattern from both the feature level and structural level ?

  9. -9- Problem Statement Anomaly Detection Notations ๐“— : Graph. Given a set of input samples ๐“จ = {๐‘ฆ ๐‘— |๐‘— = ๐“ฆ : Set of nodes in a graph. 1, . . . , ๐‘‚} , each of which is associated ๐“• : Set of edges in a graph. with a ๐บ dimension feature ๐˜ ๐‘— โˆˆ โ„ ๐บ , we ๐‘‚ : Number of nodes. aim to learn a score function ๐‘ฃ(๐˜ ๐‘— ): โ„ ๐บ โ†ฆ ๐บ : Dimension of attribute. ๐ โˆˆ โ„ ๐‘‚ร—๐‘‚ : Adjacency matrix โ„ , to classify sample ๐‘ฆ ๐‘— based on the threshold ๐œ‡ : of a network. ๐˜ โˆˆ โ„ ๐‘‚ร—๐บ : Feature matrix of all nodes. ๐‘ง ๐‘— = {1, ๐‘—๐‘” ๐‘ฃ(๐˜ ๐‘— ) โ‰ฅ ๐œ‡, 0, ๐‘๐‘ขโ„Ž๐‘“๐‘ ๐‘ฅ๐‘—๐‘ก๐‘“. where ๐‘ง ๐‘— denotes the label of sample ๐‘ฆ ๐‘— , with 0 being the normal class and 1 the anomalous class.

  10. -10- Method CADGMM Feature Dual-Encoder Decoder Graph Estimation Construction network

  11. -11- Method CADGMM K-Nearest Neighbor e.g. K=5 Original feature: ๐“จ = {๐‘ฆ ๐‘— |๐‘— = 1, . . . , ๐‘‚} Find neighbors by K-NN: ๐“ž ๐‘— = {๐‘ฆ ๐‘— ๐‘™ |๐‘™ = 1, . . . , ๐ฟ เตŸ Model correlation as graph: ๐“— = {๐“ฆ, ๐“•, ๐˜} Graph Construction ๐“ฆ = {๐‘ค ๐‘— = ๐‘ฆ ๐‘— |๐‘— = 1, . . . , ๐‘‚} ๐“• = {๐‘“ ๐‘— ๐‘™ = (๐‘ค ๐‘— , ๐‘ค ๐‘— ๐‘™ )|๐‘ค ๐‘— ๐‘™ โˆˆ ๐“ž ๐‘— }

  12. -12- Method CADGMM Feature Encoder e.g. MLP , CNN, Feature Decoder LSTM Graph Encoder e.g. GAT

  13. -13- Method CADGMM Gaussian Mixture Model Initial embedding: Z Membership: Z ๐“(๐‘š โ„ณ ) = ๐œ Z ๐“ ๐‘š โ„ณ โˆ’1 W ๐“ ๐‘š โ„ณ โˆ’1 + b ๐“ ๐‘š โ„ณ โˆ’1 , Z ๐“(0) = Z ๐“ = Softmax ( Z ๐“(๐‘€ โ„ณ ) ) , ๐“ โˆˆ โ„ ๐‘‚ร—๐‘ Parameter Estimation: ๐‘‚ ๐“ ๐‘—,๐‘› ( Z ๐‘— โˆ’๐‚ ๐’ )( Z ๐‘— โˆ’๐‚ ๐’ ) T ๐‘‚ ๐“ ๐‘—,๐‘› Z ๐‘— เท เทŒ ๐‘—=1 ๐‘—=1 ๐‚ ๐’ = , ๐šป ๐’ = ๐‘‚ ๐‘‚ ๐“ ๐‘—,๐‘› ๐“ ๐‘—,๐‘› เทŒ ๐‘—=1 เทŒ ๐‘—=1 Estimation network Energy: 2 ( Z โˆ’๐‚ ๐’ ) T ๐šป ๐‘› exp (โˆ’ 1 โˆ’1 ( Z โˆ’๐‚ ๐’ )) ๐“ ๐‘—,๐‘› ๐‘‚ E Z = โˆ’ log ฯƒ ๐‘›=1 ๐‘ ฯƒ ๐‘—=1 1 ๐‘‚ |2๐œŒ๐šป ๐‘› | 2

  14. -14- Method Loss and Anomaly Score Loss Function: 2 + ๐œ‡ 1 E Z + ๐œ‡ 2 ฯƒ ๐‘›=1 1 โ„’ = || X โˆ’ เทก ๐‘ ๐‘‚ 2 X || 2 (๐šป ๐’ ) ๐‘—๐‘— + ๐œ‡ 3 || Z || 2 ฯƒ ๐‘—=1 Embedding Covariance Rec. Error Energy Penalty Penalty Anomaly Score: ๐‘‡๐‘‘๐‘๐‘ ๐‘“ = E Z Solution for Problem: ๐‘ง ๐‘— = {1, ๐‘—๐‘” ๐‘ฃ(๐˜ ๐‘— ) โ‰ฅ ๐œ‡, 0, ๐‘๐‘ขโ„Ž๐‘“๐‘ ๐‘ฅ๐‘—๐‘ก๐‘“. ๐œ‡ =Distribution( ๐‘‡๐‘‘๐‘๐‘ ๐‘“ )

  15. -15- Experiment Datasets Baselines Evaluation Metrics Precision OC-SVM Chen et al. 2001 Recall IF Liu et al. 2008 F1-Score DSEBM Zhai et al. 2016 DAGMM Zong et al. 2018 AnoGAN Schlegl et al. 2017 ALAD Zenati et al. 2018

  16. -16- Experiment Results Consistent performance improvement!

  17. -17- Experiment Results Less sensitive to noise data! More robust!

  18. -18- Experiment Results Fig. Impact of different K values of K-NN algorithms in graph construction. Less sensitive to hyper-parameters! Easy to use!

  19. -19- Experiment Results (a). DAGMM (b). CADGMM Fig. Embedding visualization on KDD99 (Blue indicates the normal samples and orange the anomalies). Explainable and Effective!

  20. -20- Conclusion and Future Works Conventional feature learning models cannot โ€ข effectively capture the correlation among data samples for anomaly detection. We propose a general representation learning โ€ข framework to model the complex correlation among data samples for unsupervised anomaly detection. We plan to explore the correlation among samples โ€ข for extremely high-dimensional data sources like image or video. We plan to develop an adaptive and learnable graph โ€ข construction module for a more reasonable correlation modeling.

  21. -21- Reference [OC-SVM] Chen, Y., Zhou, X.S., Huang, T.S.: One-class svm for learning in image โ€ข retrieval. ICIP . 2001 [IF] 8. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. ICDM . 2008. โ€ข [DSEBM] Zhai, S., Cheng, Y., Lu, W., Zhang, Z.: Deep structured energy based โ€ข models for anomaly detection. ICML . 2016. [DAGMM] Zong, B., Song, Q., Min, M.R., Cheng, W., Lumezanu, C., Cho, D., Chen, โ€ข H.: Deep autoencoding gaussian mixture model for unsupervised anomaly detection. ICLR . 2018. [AnoGAN] Schlegl, T., Seebโ€ข ock, P ., Waldstein, S.M., Schmidt-Erfurth, U., Langs, โ€ข G.: Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. IPMI . 2017. [ALAD] Zenati, H., Romain, M., Foo, C.S., Lecouat, B., Chandrasekhar, V.: โ€ข Adversarially learned anomaly detection. ICDM . 2018.

  22. -22- Thanks Thanks for listening! Contact: isfanhy@hrbust.edu.cn Home Page: https://haoyfan.github.io/

Recommend


More recommend