Log Key Anomaly Detection model Example log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … ➢ a rigorous set of logic and control flows ➢ a ( more structured ) natural language natural language modeling multi-class classifier: history sequence => next key to appear A log key is detected to be abnormal if it does not follow the prediction. 49
Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture 50
Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture 51
Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture Training: log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … h=3 52
Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture Training: log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … h=3 53
Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture Training: log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … h=3 54
Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture Training: log key sequence: 25 18 54 57 18 56 … 25 18 54 57 56 18 … h=3 55
Log Key Anomaly Detection model Use long short-term memory ( LSTM ) architecture Detection: In detection stage, DeepLog checks if the actual next log key is among its top g probable predictions. 56
Log Key Anomaly Detection model 57
Log Key Anomaly Detection model 58
Log Key Anomaly Detection model 59
Workflow Construction Input: log key sequence 25 18 54 57 18 56 … 25 18 54 57 56 18 … Output: 60
Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities 61
Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities An example of concurrency detection: 62
Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities An example of concurrency detection: 63
Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities An example of concurrency detection: 64
Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities An example of concurrency detection: 65
Workflow Construction Method 1: Using Log Key Anomaly Detection model --- LSTM prediction probabilities An example of concurrency detection: 66
Workflow Construction Method 2: A density-based clustering approach 67
Workflow Construction Method 2: A density-based clustering approach Co-occurrence matrix of log keys ( 𝒍 𝒋 , 𝒍 𝒌 ) within distance 𝒆 𝑔 𝑒 ( 𝑙 𝑗 , 𝑙 𝑘 ) : the frequency of ( 𝑙 𝑗 , 𝑙 𝑘 ) appearing together within distance d 𝑔 ( 𝑙 𝑗 ) : the frequency of 𝑙 𝑗 in the input sequence 𝑞 𝑒 ( i , 𝑘 ) : the probability of ( 𝑙 𝑗 , 𝑙 𝑘 ) appearing together within distance d 68
Parameter Value Anomaly Detection model Example: Log messages of a particular log key: 𝒖 𝟑 : 𝑼𝒑𝒑𝒍 𝟏. 𝟕𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … 𝒖′ 𝟑 : 𝑼𝒑𝒑𝒍 𝟐. 𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … …. 69
Parameter Value Anomaly Detection model Example: Log messages of a particular log key: 𝒖 𝟑 : 𝑼𝒑𝒑𝒍 𝟏. 𝟕𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … 𝒖′ 𝟑 : 𝑼𝒑𝒑𝒍 𝟐. 𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … …. Parameter value vectors overtime: [ 𝒖 𝟑 - 𝒖 𝟐 , 0.61], [ 𝒖′ 𝟑 - 𝒖′ 𝟐 , 1.1], …. 70
Parameter Value Anomaly Detection model Example: Log messages of a particular log key: 𝒖 𝟑 : 𝑼𝒑𝒑𝒍 𝟏. 𝟕𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … 𝒖′ 𝟑 : 𝑼𝒑𝒑𝒍 𝟐. 𝟐 𝒕𝒇𝒅𝒑𝒐𝒆𝒕 𝒖𝒑 𝒆𝒇𝒃𝒎𝒎𝒑𝒅𝒃𝒖𝒇 𝒐𝒇𝒖𝒙𝒑𝒔𝒍 … …. Parameter value vectors overtime: [ 𝒖 𝟑 - 𝒖 𝟐 , 0.61], [ 𝒖′ 𝟑 - 𝒖′ 𝟐 , 1.1], …. Multi-variate time series data anomaly detection problem! 71
Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. 72
Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history time 73
Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history prediction time 74
Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history prediction actual time 75
Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history prediction MSE > Threshold ? actual time 76
Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history time 77
Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history actual prediction time 78
Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history actual MSE > Threshold ? prediction time 79
Parameter Value Anomaly Detection model Multi-variate time series data anomaly detection problem ✓ Leverage LSTM-based approach; ✓ A parameter value vector is given as input at each time step; ✓ An anomaly is detected if the mean-square-error (MSE) between prediction and actual data is too big. value history … time 80
LSTM model online update Q: How to handle false positive? 81
LSTM model online update Q: How to handle false positive? Log sequence: history 82
LSTM model online update Q: How to handle false positive? Log sequence: history model 83
LSTM model online update Q: How to handle false positive? Log sequence: history model prediction 84
LSTM model online update Q: How to handle false positive? Log sequence: current history Anomaly? model prediction 85
LSTM model online update Q: How to handle false positive? Log sequence: current history Yes Anomaly? model prediction 86
LSTM model online update Q: How to handle false positive? Log sequence: current history Yes Anomaly? False model prediction positive? 87
LSTM model online update Q: How to handle false positive? Log sequence: current history Yes Anomaly? False model prediction positive? Yes update model using this case: “ history -> current ” 88
Evaluation – log key anomaly detection Up is good Evaluation results on HDFS log data [1] . (over a million log entries with labeled anomalies) [1] PCA (SOSP’09), IM (UsenixATC’10), N -gram (baseline language model) 89
Evaluation – parameter value anomaly detection MSE: mean square error Evaluation results on OpenStack cloud log with different confidence intervals (CIs) 90
Evaluation – parameter value anomaly detection MSE: mean square error generated on CloudLab; Evaluation results on OpenStack cloud log VM creation/deletion operations; with different confidence intervals (CIs) injected performance anomalies. 91
Evaluation – parameter value anomaly detection MSE: mean square error thresholds Evaluation results on OpenStack cloud log with different confidence intervals (CIs) 92
Evaluation – parameter value anomaly detection MSE: mean square error thresholds ANOMALY Evaluation results on OpenStack cloud log with different confidence intervals (CIs) 93
Evaluation – parameter value anomaly detection MSE: mean square error thresholds ANOMALY False Positive Evaluation results on OpenStack cloud log with different confidence intervals (CIs) 94
Evaluation – LSTM model online update Up is good Evaluation on Blue Gene/L log, with and without online model update. 95
Evaluation – LSTM model online update Up is good HPC log with labeled anomalies; Evaluation on Blue Gene/L log, Available at with and without online model update. https://www.usenix.org/cfdr-data 96
Evaluation – case study: network security log Dataset: IEEE VAST Challenge 2011 (Mini Challenge 2 – Computer Networking Operations) The dataset contains firewall log, IDS log, etc. 97
Evaluation – case study: network security log Dataset: IEEE VAST Challenge 2011 (Mini Challenge 2 – Computer Networking Operations) The dataset contains firewall log, IDS log, etc. Detection results. 98
Evaluation – case study: network security log Dataset: IEEE VAST Challenge 2011 (Mini Challenge 2 – Computer Networking Operations) The dataset contains firewall log, IDS log, etc. Detection results. Could be fixed with prior knowledge of “documented IP” 99
Evaluation – workflow construction Constructed workflow of VM Creation . (previously generated OpenStack cloud log) 100
Recommend
More recommend