-1- Correlation-aware Deep Generative Model for Unsupervised Anomaly Detection Haoyi Fan 1 , Fengbin Zhang 1 , Ruidong Wang 1 , Liang Xi 1 , Zuoyong Li 2 Harbin University of Science and Technology 1 Minjiang University 2 isfanhy@hrbust.edu.cn
-2- Background Anomaly Anomaly Observed Space Latent Space Normal
-3- Background https://www.explosion.com/135494/5-effective-strategies-of-fraud- https://towardsdatascience.com/building-an-intrusion-detection-system- detection-and-prevention-for-ecommerce/ using-deep-learning-b9488332b321 Fraud Detection Intrusion Detection https://planforgermany.com/switching-private-public-health-insurance- https://blog.exporthub.com/working-with-chinese-manufacturers/ germany/ Disease Detection Fault Detection
-4- Background Unsupervised Anomaly Detection โ From the Density Estimation Perspective Data samples: ๐ ๐ข๐ ๐๐๐ = ๐ฆ 1 , ๐ฆ 2 , ๐ฆ 3 , โฆ , ๐ฆ ๐ , ๐ฆ ๐ is assumed normal. Latent Space
-5- Background Unsupervised Anomaly Detection โ From the Density Estimation Perspective Data samples: ๐ ๐ข๐ ๐๐๐ = ๐ฆ 1 , ๐ฆ 2 , ๐ฆ 3 , โฆ , ๐ฆ ๐ , ๐ฆ ๐ is assumed normal. Model: ๐(๐ฆ) Latent Space
-6- Background Unsupervised Anomaly Detection โ From the Density Estimation Perspective Data samples: ๐ ๐ข๐ ๐๐๐ = ๐ฆ 1 , ๐ฆ 2 , โฆ , ๐ฆ ๐ , ๐ฆ ๐ is assumed normal. Model: ๐(๐ฆ) Test samples: ๐ ๐ข๐๐ก๐ข = ๐ฆ 1 , ๐ฆ 2 , โฆ , ๐ฆ ๐ , ๐ฆ ๐ข is unknow. if ๐(๐ฆ ๐ข ) < ๐ , ๐ฆ ๐ข is abnormal . if ๐(๐ฆ ๐ข ) โฅ ๐ , ๐ฆ ๐ข is normal . Latent Space
-7- Background Unsupervised Anomaly Detection โ From the Density Estimation Perspective Data samples: ๐ ๐ข๐ ๐๐๐ = ๐ฆ 1 , ๐ฆ 2 , โฆ , ๐ฆ ๐ , ๐ฆ ๐ is assumed normal. Model: ๐(๐ฆ) Test samples: ๐ ๐ข๐๐ก๐ข = ๐ฆ 1 , ๐ฆ 2 , โฆ , ๐ฆ ๐ , ๐ฆ ๐ข is unknow. if ๐(๐ฆ ๐ข ) < ๐ , ๐ฆ ๐ข is abnormal . if ๐(๐ฆ ๐ข ) โฅ ๐ , ๐ฆ ๐ข is normal . Latent Space Anomalies reside in the low probability density areas.
-8- Background Correlation among data samples Conventional Anomaly Feature Learning Detection Graph Modeling Feature Space Correlation-aware Anomaly Feature Learning Detection Structure Space How to discover the normal pattern from both the feature level and structural level ?
-9- Problem Statement Anomaly Detection Notations ๐ : Graph. Given a set of input samples ๐จ = {๐ฆ ๐ |๐ = ๐ฆ : Set of nodes in a graph. 1, . . . , ๐} , each of which is associated ๐ : Set of edges in a graph. with a ๐บ dimension feature ๐ ๐ โ โ ๐บ , we ๐ : Number of nodes. aim to learn a score function ๐ฃ(๐ ๐ ): โ ๐บ โฆ ๐บ : Dimension of attribute. ๐ โ โ ๐ร๐ : Adjacency matrix โ , to classify sample ๐ฆ ๐ based on the threshold ๐ : of a network. ๐ โ โ ๐ร๐บ : Feature matrix of all nodes. ๐ง ๐ = {1, ๐๐ ๐ฃ(๐ ๐ ) โฅ ๐, 0, ๐๐ขโ๐๐ ๐ฅ๐๐ก๐. where ๐ง ๐ denotes the label of sample ๐ฆ ๐ , with 0 being the normal class and 1 the anomalous class.
-10- Method CADGMM Feature Dual-Encoder Decoder Graph Estimation Construction network
-11- Method CADGMM K-Nearest Neighbor e.g. K=5 Original feature: ๐จ = {๐ฆ ๐ |๐ = 1, . . . , ๐} Find neighbors by K-NN: ๐ ๐ = {๐ฆ ๐ ๐ |๐ = 1, . . . , ๐ฟ เต Model correlation as graph: ๐ = {๐ฆ, ๐, ๐} Graph Construction ๐ฆ = {๐ค ๐ = ๐ฆ ๐ |๐ = 1, . . . , ๐} ๐ = {๐ ๐ ๐ = (๐ค ๐ , ๐ค ๐ ๐ )|๐ค ๐ ๐ โ ๐ ๐ }
-12- Method CADGMM Feature Encoder e.g. MLP , CNN, Feature Decoder LSTM Graph Encoder e.g. GAT
-13- Method CADGMM Gaussian Mixture Model Initial embedding: Z Membership: Z ๐(๐ โณ ) = ๐ Z ๐ ๐ โณ โ1 W ๐ ๐ โณ โ1 + b ๐ ๐ โณ โ1 , Z ๐(0) = Z ๐ = Softmax ( Z ๐(๐ โณ ) ) , ๐ โ โ ๐ร๐ Parameter Estimation: ๐ ๐ ๐,๐ ( Z ๐ โ๐ ๐ )( Z ๐ โ๐ ๐ ) T ๐ ๐ ๐,๐ Z ๐ เท เท ๐=1 ๐=1 ๐ ๐ = , ๐ป ๐ = ๐ ๐ ๐ ๐,๐ ๐ ๐,๐ เท ๐=1 เท ๐=1 Estimation network Energy: 2 ( Z โ๐ ๐ ) T ๐ป ๐ exp (โ 1 โ1 ( Z โ๐ ๐ )) ๐ ๐,๐ ๐ E Z = โ log ฯ ๐=1 ๐ ฯ ๐=1 1 ๐ |2๐๐ป ๐ | 2
-14- Method Loss and Anomaly Score Loss Function: 2 + ๐ 1 E Z + ๐ 2 ฯ ๐=1 1 โ = || X โ เทก ๐ ๐ 2 X || 2 (๐ป ๐ ) ๐๐ + ๐ 3 || Z || 2 ฯ ๐=1 Embedding Covariance Rec. Error Energy Penalty Penalty Anomaly Score: ๐๐๐๐ ๐ = E Z Solution for Problem: ๐ง ๐ = {1, ๐๐ ๐ฃ(๐ ๐ ) โฅ ๐, 0, ๐๐ขโ๐๐ ๐ฅ๐๐ก๐. ๐ =Distribution( ๐๐๐๐ ๐ )
-15- Experiment Datasets Baselines Evaluation Metrics Precision OC-SVM Chen et al. 2001 Recall IF Liu et al. 2008 F1-Score DSEBM Zhai et al. 2016 DAGMM Zong et al. 2018 AnoGAN Schlegl et al. 2017 ALAD Zenati et al. 2018
-16- Experiment Results Consistent performance improvement!
-17- Experiment Results Less sensitive to noise data! More robust!
-18- Experiment Results Fig. Impact of different K values of K-NN algorithms in graph construction. Less sensitive to hyper-parameters! Easy to use!
-19- Experiment Results (a). DAGMM (b). CADGMM Fig. Embedding visualization on KDD99 (Blue indicates the normal samples and orange the anomalies). Explainable and Effective!
-20- Conclusion and Future Works Conventional feature learning models cannot โข effectively capture the correlation among data samples for anomaly detection. We propose a general representation learning โข framework to model the complex correlation among data samples for unsupervised anomaly detection. We plan to explore the correlation among samples โข for extremely high-dimensional data sources like image or video. We plan to develop an adaptive and learnable graph โข construction module for a more reasonable correlation modeling.
-21- Reference [OC-SVM] Chen, Y., Zhou, X.S., Huang, T.S.: One-class svm for learning in image โข retrieval. ICIP . 2001 [IF] 8. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. ICDM . 2008. โข [DSEBM] Zhai, S., Cheng, Y., Lu, W., Zhang, Z.: Deep structured energy based โข models for anomaly detection. ICML . 2016. [DAGMM] Zong, B., Song, Q., Min, M.R., Cheng, W., Lumezanu, C., Cho, D., Chen, โข H.: Deep autoencoding gaussian mixture model for unsupervised anomaly detection. ICLR . 2018. [AnoGAN] Schlegl, T., Seebโข ock, P ., Waldstein, S.M., Schmidt-Erfurth, U., Langs, โข G.: Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. IPMI . 2017. [ALAD] Zenati, H., Romain, M., Foo, C.S., Lecouat, B., Chandrasekhar, V.: โข Adversarially learned anomaly detection. ICDM . 2018.
-22- Thanks Thanks for listening! Contact: isfanhy@hrbust.edu.cn Home Page: https://haoyfan.github.io/
Recommend
More recommend