In Incorporating Feedback in into Tree-based Anomaly Detection Shubhomoy Das, Weng-Keen Wong, Alan Fern, Thomas G. Dietterich and Md Amran Siddiqui School of EECS
Anomaly Detection β’ Goal: Identify rare or strange objects 2
Anomaly Detection β’ Goal: Identify rare or strange objects 2
Typical Investigation Anomaly Detector Ranking π(π¦) 3
Typical Investigation Anomaly Detector Ranking π(π¦) 3
Typical Investigation Anomaly Detector Ranking π(π¦) 3
Typical Investigation Anomaly Detector Ranking π(π¦) . . . 3
Typical Investigation Anomaly Detector Ranking π(π¦) β’ Major problem: Statistical anomalies donβt necessarily correspond to semantic anomalies . . . β’ Need to deal with large number of false positives 3
Investigation with Feedback Anomaly Detector Ranking π(π¦) 4
Investigation with Feedback Anomaly Detector Ranking π(π¦) Nominal 4
Investigation with Feedback Anomaly Detector Ranking π(π¦) Nominal 4
Investigation with Feedback Anomaly Detector Ranking π(π¦) 4
Investigation with Feedback Anomaly Detector Ranking π(π¦) Nominal 4
Investigation with Feedback Anomaly Detector Ranking π(π¦) Nominal 4
Investigation with Feedback Anomaly Detector Ranking π(π¦) Nominal . . . 4
Investigation with Feedback Anomaly Detector Ranking π(π¦) Anomaly . . . 4
Investigation with Feedback Anomaly Detector Ranking π(π¦) Anomaly . . . 4
Investigation with Feedback Anomaly Detector Ranking π(π¦) β’ Ranking is adaptive Anomaly . β’ Reduces false positive . . 4
Tree-based Anomaly Detection β’ Isolation Forest β’ HS-Trees β’ RS-Forest β’ RPAD β’ Random Projection Forest β’ β¦ 5
Isolation Forest Random feature and random split β₯ point < Shallow leaf indicates anomaly Deeper leaf indicates nominal 6
Isolation Forest Random feature and random split β₯ point < Shallow leaf indicates anomaly Deeper leaf indicates nominal 6
Isolation Forest Typically 100 trees in practice Random feature and random split β₯ point < Shallow leaf indicates anomaly Deeper leaf indicates nominal 6
Weighted Representation of Trees < β₯ π¨(π¦) = β1, 0, 0, β1, 0, 0, 0, β1, β1, β¦ π (extremely sparse) β’ Weights for isolation forest: π₯ = 1, 1, 1, 1, 1, 1, 1, 1, 1, β¦ π π β’ Different set of weights will result other tree based detectors π‘πππ π π¦ = π₯ π . π¨ π¦ 7
Active Anomaly Discovery π’ π π π₯ π’ 8
Active Anomaly Discovery π’ π π π₯ π’ 8
Active Anomaly Discovery Nominal π’ π π π₯ π’ 8
Active Anomaly Discovery Nominal π’ π’+1 π π π π π₯ π’ π₯ π’+1 8
Active Anomaly Discovery Nominal π’ π’+1 π π π π π₯ π’ π₯ π’+1 8
Active Anomaly Discovery Anomaly Nominal π’ π’+1 π π π π π₯ π’ π₯ π’+1 8
Active Anomaly Discovery Anomaly Nominal π’ π’+1 π’+2 π π π π π π π₯ π’ π₯ π’+1 π₯ π’+2 8
Active Anomaly Discovery Anomaly Nominal π’ π’+1 π’+2 π π π π π π π₯ π’ π₯ π’+1 π₯ π’+2 8
Active Anomaly Discovery Anomaly Nominal π’ π’+1 π’+2 π π π π π π β¦ π₯ π’ π₯ π’+1 π₯ π’+2 8
Result True anomalies Synthetic Dataset Baseline discovers 12 AAD discovers 23 anomalies in anomalies in 35 iterations 35 iterations 9
Result 0 Feedback 10
Result 0 Feedback 10 Feedback 10
Result 20 Feedback 0 Feedback 10 Feedback 10
Result 20 Feedback 0 Feedback 10 Feedback 25 Feedback 10
Result 20 Feedback 0 Feedback 10 Feedback 25 Feedback 35 Feedback 10
A closer look at the data with t-SNE 11
Recommend
More recommend