from system logs through deep learning
play

from System Logs through Deep Learning Min Du , Feifei Li, Guineng - PowerPoint PPT Presentation

DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du , Feifei Li, Guineng Zheng, Vivek Srikumar University of Utah Background 2 Background System Event Log 3 Background System Event Log Available


  1. Log Key Anomaly Detection model Example log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … ➢ a rigorous set of logic and control flows ➢ a ( more structured ) natural language natural language modeling multi-class classifier: history sequence => next key to appear A log key is detected to be abnormal if it does not follow the prediction. 49

  2. Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture 50

  3. Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture 51

  4. Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture Training: log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … h=3 52

  5. Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture Training: log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … h=3 53

  6. Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture Training: log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … h=3 54

  7. Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture Training: log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … h=3 55

  8. Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture Detection: In detection stage, DeepLog checks if the actual next log key is among its top g probable predictions. 56

  9. Log Key Anomaly Detection model 57

  10. Log Key Anomaly Detection model 58

  11. Log Key Anomaly Detection model 59

  12. Workflow Construction Input: log key sequence 25 18 54 57 18 56 … 25 18 54 57 56 18 … Output: 60

  13. Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities 61

  14. Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities An example of concurrency detection: 62

  15. Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities An example of concurrency detection: 63

  16. Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities An example of concurrency detection: 64

  17. Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities An example of concurrency detection: 65

  18. Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities An example of concurrency detection: 66

  19. Workflow Construction Method 2: A density-based clustering approach 67

  20. Workflow Construction Method 2: A density-based clustering approach Co-occurrence matrix of log keys ( 𝒍 𝒋 , 𝒍 𝒌 ) within distance 𝒆 𝑔 𝑒 ( 𝑙 𝑗 , 𝑙 𝑘 ) : the frequency of ( 𝑙 𝑗 , 𝑙 𝑘 ) appearing together within distance d 𝑔 ( 𝑙 𝑗 ) : the frequency of 𝑙 𝑗 in the input sequence 𝑞 𝑒 ( i , 𝑘 ) : the probability of ( 𝑙 𝑗 , 𝑙 𝑘 ) appearing together within distance d 68

  21. Parameter Value Anomaly Detection model Example: Log messages of a particular log key: 𝒖 𝟑 : 𝑼𝒑𝒑𝒍 𝟏. 𝟕𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … 𝒖′ 𝟑 : 𝑼𝒑𝒑𝒍 𝟐. 𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … …. 69

  22. Parameter Value Anomaly Detection model Example: Log messages of a particular log key: 𝒖 𝟑 : 𝑼𝒑𝒑𝒍 𝟏. 𝟕𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … 𝒖′ 𝟑 : 𝑼𝒑𝒑𝒍 𝟐. 𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … …. Parameter value vectors overtime: [ 𝒖 𝟑 - 𝒖 𝟐 , 0.61], [ 𝒖′ 𝟑 - 𝒖′ 𝟐 , 1.1], …. 70

  23. Parameter Value Anomaly Detection model Example: Log messages of a particular log key: 𝒖 𝟑 : 𝑼𝒑𝒑𝒍 𝟏. 𝟕𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … 𝒖′ 𝟑 : 𝑼𝒑𝒑𝒍 𝟐. 𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … …. Parameter value vectors overtime: [ 𝒖 𝟑 - 𝒖 𝟐 , 0.61], [ 𝒖′ 𝟑 - 𝒖′ 𝟐 , 1.1], …. Multi-variate time series data anomaly detection problem! 71

  24. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. 72

  25. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history time 73

  26. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history prediction time 74

  27. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history prediction actual time 75

  28. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history prediction MSE > Threshold ? actual time 76

  29. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history time 77

  30. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history actual prediction time 78

  31. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history actual MSE > Threshold ? prediction time 79

  32. Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history … time 80

  33. LSTM model online update Q: How to handle false positive? 81

  34. LSTM model online update Q: How to handle false positive? Log sequence: history 82

  35. LSTM model online update Q: How to handle false positive? Log sequence: history model 83

  36. LSTM model online update Q: How to handle false positive? Log sequence: history model prediction 84

  37. LSTM model online update Q: How to handle false positive? Log sequence: current history Anomaly? model prediction 85

  38. LSTM model online update Q: How to handle false positive? Log sequence: current history Yes Anomaly? model prediction 86

  39. LSTM model online update Q: How to handle false positive? Log sequence: current history Yes Anomaly? False model prediction positive? 87

  40. LSTM model online update Q: How to handle false positive? Log sequence: current history Yes Anomaly? False model prediction positive? Yes update model using this case: “ history -> current ” 88

  41. Evaluation – log key anomaly detection Up is good Evaluation results on HDFS log data [1] . (over a million log entries with labeled anomalies) [1] PCA (SOSP’09), IM (UsenixATC’10), N -gram (baseline language model) 89

  42. Evaluation – parameter value anomaly detection MSE: mean square error Evaluation results on OpenStack cloud log with different confidence intervals (CIs) 90

  43. Evaluation – parameter value anomaly detection MSE: mean square error generated on CloudLab; Evaluation results on OpenStack cloud log VM creation/deletion operations; with different confidence intervals (CIs) injected performance anomalies. 91

  44. Evaluation – parameter value anomaly detection MSE: mean square error thresholds Evaluation results on OpenStack cloud log with different confidence intervals (CIs) 92

  45. Evaluation – parameter value anomaly detection MSE: mean square error thresholds ANOMALY Evaluation results on OpenStack cloud log with different confidence intervals (CIs) 93

  46. Evaluation – parameter value anomaly detection MSE: mean square error thresholds ANOMALY False Positive Evaluation results on OpenStack cloud log with different confidence intervals (CIs) 94

  47. Evaluation – LSTM model online update Up is good Evaluation on Blue Gene/L log, with and without online model update. 95

  48. Evaluation – LSTM model online update Up is good HPC log with labeled anomalies; Evaluation on Blue Gene/L log, Available at with and without online model update. https://www.usenix.org/cfdr-data 96

  49. Evaluation – case study: network security log Dataset: IEEE VAST Challenge 2011 (Mini Challenge 2 – Computer Networking Operations) The dataset contains firewall log, IDS log, etc. 97

  50. Evaluation – case study: network security log Dataset: IEEE VAST Challenge 2011 (Mini Challenge 2 – Computer Networking Operations) The dataset contains firewall log, IDS log, etc. Detection results. 98

  51. Evaluation – case study: network security log Dataset: IEEE VAST Challenge 2011 (Mini Challenge 2 – Computer Networking Operations) The dataset contains firewall log, IDS log, etc. Detection results. Could be fixed with prior knowledge of “documented IP” 99

  52. Evaluation – workflow construction Constructed workflow of VM Creation . (previously generated OpenStack cloud log) 100

Recommend


More recommend