Label-Less: A Semi-Automatic Labelling Tool for KPI Anomalies Nengwen Zhao , Jing Zhu, Rong Liu, Dapeng Liu, Ming Zhang, Dan Pei INFOCOM 2019 1
Operational Background Design Evaluation Experience 2
Operational Background Design Evaluation Experience 3
Internet-Based Services and KPI n KPIs (Key Performance Indicators): A set of performance metrics that monitor the service Search Engine Online Search response time Shopping Social Network Memory usage Internet-based services 4
KPI Anomaly KPI anomalies: Potential failures unexpected behavior Detect it ! [IMC15] 5
KPI Anomaly Detection Methods Ø Existing KPI anomaly detection methods: Traditional statistical methods: Holt-Winters, MA, ARIMA… 1 [CONEXT11, INFOCOM12,SIGCOMM13…] 2 Supervised ensemble learning: Opprentice, EGADS… [IMC15,KDD15…] Unsupervised learning: Donut, Auto-encoder… 3 [WWW18,AAAI19,IJCAI17,KDD17…] 6
KPI Anomaly Detection Methods Ø Existing KPI anomaly detection methods: Traditional statistical methods: Holt-Winters, MA, ARIMA… 1 [IMC03, INFOCOM12…] 2 Supervised ensemble learning: Opprentice, EGADS… [IMC15,KDD15…] KPI anomaly detection is very important and many efforts have been devoted to the research of anomaly detection Unsupervised learning: Donut, Autoencoder… 3 [WWW18,AAAI19,…] Ø Anomaly detection products in industry: • Prometheus, Anodot, Kibana… 7
KPI Anomaly Detection Methods The performance in reality is far from satisfying: - Lack of generality - KPIs in practice have various types of patterns seasonal Variable Donut is designed only for seasonal KPIs [WWW18] Stationary 8
Public Datasets Types Datasets Problem • UCR time series archive • Time series Not for anomaly • UCI machine learning repository datasets detection • Yahoo Benchmark[KDD15] • KPIs with limited data KPI anomaly • Numenta Anomaly Benchmark detection points • Synthetic anomalies datasets [ICMLA15] 1400 data points, only one anomaly segment with four data points. 9
Public Datasets Public time series datasets n – UCR time series archive, UCI machine learning repository – Problem: aim at classification, clustering or regression not for anomaly detection Public KPI anomaly detection datasets n The community of KPI anomaly detection is in an urgent need – Yahoo Benchmark , Numenta Anomaly Benchmark [KDD15 ,ICMLA15] for a large-scale and diverse KPI anomaly dataset. – Problem • KPIs with limited data points • Synthetic anomalies 10
Challenges Obtaining a large-scale KPI anomaly dataset with high-quality ground truth has been a great challenge. Labeling KPI anomalies takes domain knowledge of IT operations 1 Anomaly? Where are anomalies? Weekend normal pattern! 11
Challenges It is labor intensive to carefully examine a several-month-long KPI 2 back and forth and try to label anomalies in a consistent manner 6-month-long KPI with Take a few hours 30, 0000 data points 1-minute interval to label A labeling tool 12
Challenges 3 The number of KPIs that need to be labeled is very large Rare Large Diverse KPI anomalies in number of pattern each KPI KPIs Usually less Millions of KPIs in Seasonal than 1% large companies Variable Stationary 13
Challenges Ø A real example: KPI Anomaly Detection Algorithm Competition: http://iops.ai/ Labeling overhead Thousands of 30 labeled unlabeled KPIs KPIs Labeling overhead has become the main hurdle to large-scale KPI anomaly dataset, which in turn is the main hurdle to effective and practical KPI anomaly detection. 14
Key Ideas n Design a semi-automatic labeling framework Visually scanning through the 1 KPIs to gauge their normal patterns and variations Unsupervised anomaly Raw KPI detection : provide potential anomalies Potential anomalies 15
Key Ideas n Design a semi-automatic labeling framework Examining KPIs back and forth to check whether similar patterns 2 are labeled consistently Template Anomaly similarity search : discover similar anomalies automatically Similar anomalies Potential anomalies 16
Operational Background Design Evaluation Experience 17
Label-Less Overview Yes (Labeled KPIs) Operators Investigate All anomalies have been Unlabeled labeled? KPIs Preprocessing Check of top-k No (Choose similar anomalies Feature another Extraction template) Accelerated Isolation DTW Forest Threshold Candidate Anomaly Selection Anomalies Template Unsupervised Anomaly Similarity Search Anomaly Detection 18
Unsupervised Anomaly Detection Candidate Feature extraction Isolation Forest Threshold selection KPIs anomalies Feature extraction n – Time series prediction models as feature extractors. – Feature: prediction error Normal points: well predicted with small prediction errors KPI value Predicted value 3 Value 2 1 Anomalies: large 0 prediction errors 04-04 04-05 04-06 04-07 Time (Day) 19
Unsupervised Anomaly Detection Isolation Forest [ICDM08] n – Anomalies: few and different – Anomaly score: average path lengths iForest KPI value Output MA 6 EWMA HW iTree Score Anomaly score < > …… Anomaly score β β anomaly score 1 Anomaly Di ff 0.9 Value 4 1 WMA 0.6 2 Normal 0.2 ARIMA samples 0 0 0.1 0 08-03 08-04 08-05 Time (Day) Threshold selection n – Our goal: generate candidate potential anomalies – High recall and acceptable precision 20
Anomaly Similarity Search Candidate anomalies check Anomaly Top-k similar similarity search anomalies Anomaly template Goals: high accuracy and low response latency n DTW as Similarity Measure n – None of other distance measures consistently outperforms DTW [KDD12…] y DTW alignment x 21
Anomaly Similarity Search Accelerated DTW n We adopt the following three techniques from existing works to speed up DTW Constrained path 1 Limit the permissible warping paths by providing local restrictions on the set of alternative steps considered Lower bound 2 Use a cheap-to-compute lower bound of DTW to prune segments that cannot possibly be the top-k similar anomalies Early stopping 3 22
Operational Background Design Evaluation Experience 23
Datasets and Metrics Datasets Ø Four datasets containing 30 KPIs Ø A time span of about six months Ø 1-minute monitoring interval Metrics Ø Unsupervised anomaly detection • Recall Ø Anomaly similarity search • Best F-score and AUC • Response time 24
Performance of Unsupervised Anomaly Detection Yes (Labeled KPIs) Operators Investigate All anomalies have been Unlabeled KPIs labeled? Preprocessing Check of top-k No (Choose similar anomalies Feature another Extraction template) Accelerated Isolation DTW Forest Threshold Candidate Anomaly Selection Anomalies Template High recall Recall of Unsupervised Anomaly Detection with few false 1 negative 0.8 Recall 0.6 0.4 0.2 0 A B C D Dataset 25
Performance of Anomaly Similarity Search Yes (Labeled KPIs) Operators Investigate Ø Comparison with other distance measures All anomalies Unlabeled have been KPIs labeled? Preprocessing Check of top-k No (Choose similar anomalies Feature another Extraction template) Accelerated Isolation DTW Forest Threshold Candidate Anomaly Selection Anomalies Template DTW ED SBD DTW ED SBD 1 1 0.8 0.8 Best F-score 0.6 0.6 AUC 0.4 0.4 0.2 0.2 0 0 A B C D A B C D Dataset Dataset DTW is a good choice! 26
Efficiency of Anomaly Similarity Search Yes (Labeled KPIs) Operators Investigate All anomalies Unlabeled have been KPIs labeled? Comparison of per-KPI response time Preprocessing Check of top-k No (Choose similar anomalies Feature another Extraction template) Accelerated Anomaly Similarity Search Naïve Search Original DTW Isolation DTW Forest Threshold Candidate Anomaly Selection Anomalies Template 6 Per-KPI running time 5 Unsupervised anomaly detection: 4 (seconds) n Reduce search space 3 Accelerated DTW: Reduce DTW 2 n computational complexity 1 0 A B C D Dataset Under 0.5 second, real-time response 27
Operational Background Design Evaluation Experience 28
Labeling Tool with Label-Less Unsupervised anomaly detection n – Candidate potential anomalies (marked in red) – The threshold can be tuned by operators Tune threshold 29
Labeling Tool with Label-Less Anomaly similarity search n Anomaly template Similar anomalies 30
Comparison of Labeling time Traditional labeling Label-Less Scanning back Unsupervised and forth anomaly detection Checking labeling Anomaly similarity consistency search 31
Comparison of Labeling time • Experiment: eight voluntary experienced operators Group 1: Traditional labeling Group 2: Label-less Comparison of per-KPI Labeling Time Reduce more Label-Less Traditional 120 than 90% labeling Time(minutes) 100 time! 80 60 40 20 0 A B C D Dataset 32
Conclusion Labeling overhead has become the main hurdle to researching effective and n practical KPI anomaly detection. The semi-automatic labeling tool Label-Less can greatly reduce operators’ n labeling overhead. Label-Less is important first step to enable an ImageNet-like large-scale KPI n anomaly dataset with high-quality ground truth. 33
Recommend
More recommend