KDetect: Unsupervised Anomaly Detection for Cloud Systems Based on - PowerPoint PPT Presentation

KDetect: Unsupervised Anomaly Detection for Cloud Systems Based on Time Series Clustering Swati Sharma, Amadou Diarra, Fredrico Alvares, Thomas Ropars 24-6-2020 1

Context Cloud Computing runs large part of IT Infrastructure.  Large number of Virtual Machines (VMs) – several thousands.  Each executing services of unknown nature.  Non-intrusive VM analysis by cloud provider.  VMs typically monitored by resource consumption metrics.  2

Problem Domain Anomaly Detection – consequential for VM monitoring.  Anomaly – unexpected system load/behavior based on collected  system metrics. 3

Objectives Generic solution to detect anomalies.  Processing unlabelled time series.  High accuracy (recall & precision) in anomaly detection.  Quick Execution.  4

Challenges Large Data Sizes -  Execution Time per VM. ● No labels available. ● Data Content -  Diverse normal & abnormal behavior. ● Noise along with seasonal data. ● 5

Contributions KDetect –  Unsupervised learning technique to detect anomalies. ● In time series exhibiting periodic behavior. ● Dynamic Partitional Clustering Based Solution. ● Generic heuristics without any configuration changes ● Evaluation done on production dataset from EasyVirt.  Recall more than 94% & Precision more than 95%.  Fast execution (330 days data analyzed in under 3 mins).  6

Related Work Anomaly Detection in Cloud -  [Aggarwal2017] Adaptive Real-Time - Analyze nodes running similar ● applications & predict next values to detect outliers. [Zhang2019] Cross-Dataset Transfer Learning - Orthogonal to our solution. ● Transfer anomalies patterns from 1 cloud to next. Unsupervised Anomaly Detection for Time Series -  [Xu2018] Donut - State-of-the-art. Variational Auto-Encoder based. ● [Paparrizos2015] k-Shape - Basic block of every KDetect iteration. ● 7

k-Shape Iterative Refinement Clustering algorithm.  Uses Shape Based Distance (SBD) measure.  Positioning in Euclidean Space - shape comparison.  Number of clusters (k) required to be known in advance.  8

Solution: KDetect Algorithm Unsupervised Iterative Refinement Clustering algorithm.  Progressively increase 'k' and cluster time series into normal & abnormal.  Challenges -  ● Deciding what k gives good segregation? ● How to label each cluster ('N/'Ab') at every iteration? Provides generic heuristics to solve these challenges without specific  application to a particular VM. 9

KDetect C 1 Initially : C 1 – Single cluster for all time series 10

KDetect C 1 C 2 At k=2, Bigger cluster is assumed to be normal. 11

KDetect C 2 C 6 C 8 C 1 C 3 C 7 C 4 C 5 At auto-halt iteration - Good segregation of normal & abnormal clusters.  Clusters labelled 'N/Ab'.  12

Cluster Segregation Metrics : Density C 1 C 2 Cluster Density - avg of distance (SBD) between any 2 time series (degree of similarity between time series). 13

Cluster Segregation Metrics : Density C 1 C 1 C 2 C 2 C 1 C 1 C 2 C 2 Density Decrease Density Increase 14

KDetect Auto-Stop Density (cluster compactness), Standard Deviation (time series variation).  Threshold - density increase between 2 consecutive iterations.  Thresholds - Locate good local optimum.  Further iterations - Refinement.  15

Cluster Labelling C 1 C 2 16

Cluster Labelling C 1 : N C 2 : Ab 17

Cluster Labelling C 1 : N C 2 : Ab β = 2 x avg. dist. b/w any 2 points in Initial Normal Cluster. 18

Cluster Labelling C 2 C 1 C 3 SBD between C 3 & initial normal cluster > β → abnormal label ('Ab'). 19

Cluster Labelling C 2 : Ab C 1 : N C 3 : Ab SBD between C 3 & initial normal cluster > β → abnormal label ('Ab'). 20

Evaluation Performance Statistics  Comparison with State-of-the-Art  Auto-Stop Criteria  Execution Time  21

Setup & Configuration K-Shape in Python3 → Tslearn v0.3.0  Experiments conducted on Server -  CPU → 12-core Intel Xeon E5645. ● Mem → 48 GB. ● OS → Linux server edition – Debian 4.9.0-4-amd64. ● 22

Dataset Dataset Description -  Data Collection – French Company EasyVirt. ● Production Data contains almost 2000 VMs. ● 4 VMs illustrated – ● Diverse normal and diverse abnormal behavior.  Differentiating normal from abnormal is not trivial.  Manual labelling by EasyVirt Experts to evaluate KDetect. ● Data Characteristics -  Total number of days for each VM ≈ 300. ● 24-hour time windows to capture time series seasonality. ● Averaged over 10 minute intervals - 144 points in each TS. ● Metric = CPU consumption percentage. ● Normal : Abnormal = 3:1. ● 23

Performance Statistics VM Recall Precision FP % A 0.94 1 0 B 0.81 0.95 1.11 C 0.98 0.99 0.31 D 0.99 1 0 KDetect - recall > 94% in most cases, precision > 95%. 24

Comparison with State-of-the-Art : Donut Implementation in Python3 using Tensorflow 1.5.0 by  Donut authors. Reconstruction Probability Threshold → normal/abnormal.  ● Each VM - 1000 threshold values tested b/w lowest & highest probability. 60% training data & 40% testing data.  25

Comparison with State-of-the-Art : Donut KDetect outperforms Donut - precision → 48%, recall → 20%. 26

Auto-Stop Criteria Analysis Performance statistics for VM B.  Stop at significant local optimum – not 1 st .  Tradeoff → execution time vs. precision.  KDetect selects “good” value of 'k'. 27

Execution Time Analysis Avg of 10 executions.  Linear increase as function of 'k'.  Same k → Different execution times for VMs as  different sizes. 28

Execution Time Analysis Avg of 10 executions.  Linear increase as function of 'k'.  Same k → Different execution times for VMs as  different sizes. Virtual Auto-Stop Execution Machine Iteration (k) Time (sec) VM A 5 100 VM B 7 172 VM C 3 63 VM D 3 101 Fast KDetect execution → < 3 mins in worst case (B). 29

Conclusions KDetect -  Unsupervised Learning Algorithm to identify anomalies. ● Time Series exhibiting seasonal behavior. ● Dynamic Partitional Clustering based solution. ● Relies on generic heuristics to apply to large number of VMs. ● Based on k-Shape as a building block. ● Evaluation for multiple VM traces on production data -  High precision, recall & low false positives. ● Fast Execution. ● 30

Future Work Reinforcement Learning - improve Recall and Precision.  Adapt to run online - reduce lead time for anomaly detection.  31

Thank You !! 32

KDetect: Unsupervised Anomaly Detection for Cloud Systems Based on - PowerPoint PPT Presentation

KDetect: Unsupervised Anomaly Detection for Cloud Systems Based on Time Series Clustering Swati Sharma, Amadou Diarra, Fredrico Alvares, Thomas Ropars 24-6-2020 1 Context Cloud Computing runs large part of IT Infrastructure. Large

What is an anomaly? Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Defining

Isolation trees Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Isolation

Anomaly Detection of Trajectories Junier B. Oliva Anomaly Detection An anomaly (or outlier)

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative

Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection Jiong Zhang and

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Learning Rules for Anomaly Detection (LERAD) of Hostile Network Traffic Matt Mahoney Overview

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Structure of Talk Workload-sensitive Timing Behavior Anomaly Detection 1 Motivation in Large

Dataflow Anomaly Detection Presented By Archana Viswanath Computer Science and Engineering The

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

<Title> Yiqun Hu, SP Group Agenda Condition monitoring & anomaly detection

In Incorporating Feedback in into Tree-based Anomaly Detection Shubhomoy Das, Weng-Keen Wong,

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Anomaly Analysis and Diagnosis for Co-located Datacenter Workloads in the Alibaba Cluster

O N S UBNORMAL F LOATING P OINT AND A BNORMAL T IMING Marc Andrysco, David Kohlbrenner, Keaton

Abnormal Uterine Bleeding: Evaluation of Premenopausal Women Vanessa Jacoby, MD, MAS Assistant

EEE 6503 LASER T HEORY C HAPTER -7:: F AST P ULSE P RODUCTION C HAPTER -8:: N ONLINEAR O PTICS

Daejeon, South Korea 01 Introduction 02 HAI Testbed 03 HAI Security Dataset 04 Conclusion

cancer screening: results from a collaborative academic- embedded delivery system pragmatic

Outlier Detection Techniques Hans-Peter Kriegel, Peer Krger, Arthur Zimek

SAQL : A Stream-based Query System for Real-Time SA Abnormal System Behavior Detection Peng Gao 1

KDetect: Unsupervised Anomaly Detection for Cloud Systems Based on - PowerPoint PPT Presentation

KDetect: Unsupervised Anomaly Detection for Cloud Systems Based on Time Series Clustering Swati Sharma, Amadou Diarra, Fredrico Alvares, Thomas Ropars 24-6-2020 1 Context Cloud Computing runs large part of IT Infrastructure. Large

What is an anomaly? Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Defining

Isolation trees Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Isolation

Anomaly Detection of Trajectories Junier B. Oliva Anomaly Detection An anomaly (or outlier)

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative

Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection Jiong Zhang and

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Learning Rules for Anomaly Detection (LERAD) of Hostile Network Traffic Matt Mahoney Overview

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Structure of Talk Workload-sensitive Timing Behavior Anomaly Detection 1 Motivation in Large

Dataflow Anomaly Detection Presented By Archana Viswanath Computer Science and Engineering The

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

&lt;Title&gt; Yiqun Hu, SP Group Agenda Condition monitoring &amp; anomaly detection

In Incorporating Feedback in into Tree-based Anomaly Detection Shubhomoy Das, Weng-Keen Wong,

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Anomaly Analysis and Diagnosis for Co-located Datacenter Workloads in the Alibaba Cluster

O N S UBNORMAL F LOATING P OINT AND A BNORMAL T IMING Marc Andrysco, David Kohlbrenner, Keaton

Abnormal Uterine Bleeding: Evaluation of Premenopausal Women Vanessa Jacoby, MD, MAS Assistant

EEE 6503 LASER T HEORY C HAPTER -7:: F AST P ULSE P RODUCTION C HAPTER -8:: N ONLINEAR O PTICS

Daejeon, South Korea 01 Introduction 02 HAI Testbed 03 HAI Security Dataset 04 Conclusion

cancer screening: results from a collaborative academic- embedded delivery system pragmatic

Outlier Detection Techniques Hans-Peter Kriegel, Peer Krger, Arthur Zimek

SAQL : A Stream-based Query System for Real-Time SA Abnormal System Behavior Detection Peng Gao 1

<Title> Yiqun Hu, SP Group Agenda Condition monitoring & anomaly detection