classification with partial labels
play

Classification with Partial Labels Weibin Meng , Ying Liu, Shenglin - PowerPoint PPT Presentation

Device-Agnostic Log Anomaly Classification with Partial Labels Weibin Meng , Ying Liu, Shenglin Zhang, Dan Pei Hui Dong, Lei Song, Xulong Luo 2018/6/23 weibin 1 Motivation Architecture of Datacenter Networks Inter-DC Network Core Core


  1. Device-Agnostic Log Anomaly Classification with Partial Labels Weibin Meng , Ying Liu, Shenglin Zhang, Dan Pei Hui Dong, Lei Song, Xulong Luo 2018/6/23 weibin 1

  2. Motivation Architecture of Datacenter Networks Inter-DC Network Core Core IDPS IDPS Router Firewall Firewall Access L3 Router VPN VPN Load Load Aggregation Switch balancer balancer L2 ToR Switch Server 2018/6/23 weibin 2

  3. Motivation • Traditional anomaly detection methods usually monitor KPI curves . • KPI need network operators select manually. • KPI methods can only find anomalous behaviors traffic flow CPU utilization • Logs describe some events that KPI curve can’t, such as the root cause. Logs are most valuable data sources for device management. • 2018/6/23 weibin 3

  4. Device logs Message types are ambiguous for accurate classification Examples of device(switch) log : • Detailed Messages are Semi- structured natural languages provided by device developers 4 2018/6/23 weibin

  5. Drawbacks in Regular Expression Regular Expression is the popular technique for anomalous log classification. • Drawbacks: • Operators • Low generality • Labor intensity Configure anomalous • … regular expressions Type 1 Yes Syslog Match … RE for Type n Manufacturer B logs No Manufacturer A Ignore 5 2018/6/23 weibin

  6. Problem Definitions weibin 2018/6/23 6

  7. Challenges • Device-agnostic vocabulary • Device logs are type- specific and manufacturer- specific. • It is hard to fit one classification model for all different device types. • Partial labels • Network operators only label partial anomalous logs they encountered. • Difficult to train a traditional classification model. weibin 2018/6/23 7

  8. LogClass Design Overview Offline Learning Component Historical Top-n 1. Log Preprocessing Logs Keywords 2. Feature vector Filtering Feature Multiclass Parameters Vector Classifier 3. Anomaly detection PU Binary Anomaly Classifier Records 4. Anomaly classification Vocabulary Detect Anomalous Logs Classify Filtering Feature Anomalous Parameters Vector Logs Real-time Alarm Logs Online Classification Component 8 2018/6/23 weibin

  9. Text feature vector The universal method to construct a text feature vector is the bag-of-words model. logs: 𝑀 1 Interface te-1/1/59 changed state to down 𝑀 2 vlan22 changed state to up VlanInterface 𝑀 3 Neighbour vlan23 changed state from Exchange to Loading bag-of-words vectors: Vocabulary Interface changed state to down from Exchange Loading up VlanInterface Neighbour 𝑀 1 1 1 1 1 1 0 0 0 0 0 0 𝑀 2 0 1 1 1 0 1 0 0 0 0 1 𝑀 3 0 1 1 1 0 0 1 1 1 1 0 Assign weighting values to each component in vectors. (e.g., TF-IDF) 2018/6/23 weibin 9

  10. PU Learning • Different from tradition classification. PU learning • In our scnario, labelling all existing anomalous logs is not natural. • PU Learning input: • Positive set P (Anomalous logs) • Unlabeled set U (Unlabeled logs) : positive data unlabeled data (Gang Niu et al. NIPS’16) 2018/6/23 weibin 10

  11. Evaluation Benchmark methods Dataset • Real-world Switch logs • Labeled-LDA • 58 switches types • Regular Expression • Two-week period • 1,758,456 anomalous logs • 16,702,547 unlabeled logs weibin 2018/6/23 11

  12. Evaluation on PU Learning Sampled anomalous logs randomly cross all switch types and assumed they have no labels. PU Learning classifier is more stable than traditional classifier. weibin 2018/6/23 12

  13. Evaluation on Anomalous Log Classification LogClass is more The overheads of L-LDA accurate. and RE are larger than LogClass weibin 2018/6/23 13

  14. Conclusion Challenges • Device-Agnostic vocabulary • Partial anomalous logs have labels LogClass • PU learning • Simple NLP techniques Evaluation • Real-world switch logs. 2018/6/23 weibin 14

  15. Thank you! mwb16@mails.tsinghua.edu.cn 2018/6/23 weibin 15

Recommend


More recommend