Adaptive Anomaly Detection via Self-Calibration and Dynamic - PowerPoint PPT Presentation

Adaptive Anomaly Detection via Self-Calibration and Dynamic Updating Gabriela F. Cretu-Ciocarlie Department of Computer Science Columbia University Joint work with Angelos Stavrou, Michael E. Locasto, Salvatore J. Stolfo

Motivation and Problem  Current attack methods, such as polymorphic engines, will overwhelm signature-based anomaly detectors [Song07, Crandall05]  Relying on anomaly-based detection (AD) sensors to detect 0-day attacks has become a necessity BUT…. 2

Motivation and Problem  Current attack methods, such as polymorphic engines, will overwhelm signature-based anomaly detectors [Song07, Crandall05]  Relying on anomaly-based detection (AD) sensors to detect 0-day attacks has become a necessity BUT….  There is a major hurdle in the deployment, operation, and maintenance of AD systems  Calibrate them  Update their models when changes appear in the 3 protected system

Contributions  Identifying the intrinsic characteristics of the training data ( i.e. self-calibration )  Cleansing a data set of attacks and abnormalities by automatically selecting an adaptive threshold for the voting ( i.e. automatic self-sanitization )  Maintaining the performance we gained by applying the sanitization methods beyond the initial training phase and extending them throughout the lifetime of the sensor ( i.e. self-update ) 4

Training Dataset Sanitization  Attacks and accidental malformed requests/data cause a local "pollution“ of training data  An attack can pass as normal traffic if it is part of the training set  We seek to remove both malicious and abnormal data from the training dataset 5

Training Strategies  Divide data into multiple blocks  automatic selection of the optimal time granularity …… 6

Time Granularity Characteristics  Smaller value of the time granularity g = > confines the effect of an individual attack to a smaller neighborhood of micro-models  Excessively small values can lead to under-trained models  Automatically determine when a model is stable 7

How to Determine g?  Compute the likelihood L of seeing new n-grams  Use a linear least squares approximation over a sliding window of points to detect the stabilization point  When the stabilization point is found, reset L and start a new model 8

Time Granularity Detection Likelihood of seeing new grams 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 12000 24000 36000 48000 60000 72000 84000 0 1500 3000 4500 Time (s) Time (s) a) 9

Automatic Time Granularity www1 g ≈ 2 hours and 22 minutes std ≈ 21 minutes lists g ≈ 2 hours and 20 minutes std ≈ 13 minutes) 10

Adaptive Training using Self- Sanitization  Divide data into multiple blocks ( automatically )  Build micro-models for each block  Test all models against a smaller dataset  Build sanitized and abnormal models  sanitized model: ……..  abnormal model: µM 1 M 1 µM 2 M 2 µM K M K Abnormal Abnormal model model Voting Voting algorithm algorithm  V = automatically determined Sanitized Sanitized 11 Training phase Training phase model model voting threshold

Automatic Detection of Voting Threshold ++"*  Analyze how the different 954:50.;152-<2=;:>5.62855?582;620-4?;72@AB ++"( values of the micro-model ++"& ensemble score are ++"$ distributed across the ++ tested dataset +*"*  V=0: a packet must be +*"( # ! 3-C4 approved by all micro- $ ! 3-C4 % ! 3-C4 +*"& ( ! 3-C4 models in order to be $& ! 3-C4 +*"$ &* ! 3-C4 deemed normal #!! ! +* ! !"# !"$ !"% !"& !"' !"( !") !"* !"+  V=1: a packet is deemed ,-./012.34563-782, as normal as long as it is accepted by at least one 12 micro-model.

Voting Threshold Detection  where P(V) – number of packets deemed normal  Separation problem:  finding the smallest threshold (minimize V) that  captures as much of the data (maximize p(V)) 13

Example of Voting Threshold Detection 1 lists www1 14

Automated vs. Empirical ( www1 ) 15

Automated vs. Empirical ( lists ) 16

Overall Performance 17

Self-Updating AD Models  The way users interact with systems can evolve over time, as can the systems themselves.  AD models need to adapt to concept drift  Online learning can accommodate changes in behavior of computer users [Lane99]  Continuously create micro-models and sanitized models  Use introspection : the micro-models are engaged in a voting scheme against their own micro-datasets

Alert Rate For www1 19

Self-Update Performance 20

Concept Drift at Larger Scale 21

Computational Performance  25 micro-models  Each micro-model size is 483KB on average (10.98 MB of traffic) 22

Possible Improvements  Parallelization  Multiple datasets can be tested against multiple models in parallel  The test for each dataset-model pair is an independent operation  Faster test for the bloom filters 23

Testing Strategies: Shadow Sensor Redirection  Shadow sensor  Heavily instrumented host based anomaly detector akin to an “ oracle ”  Performs substantially slower than the native application  Use the shadow sensor to classify or corroborate the alerts produced by the AD sensors Sanitized Sanitized model model  Feasibility and scalability depend on the number of Alert? False false alerts generated by the AD False positive positive sensor Alert? Shadow Shadow sensor Alert server Alert Testing phase Testing phase 24

Distributed Sanitization  Use external knowledge (models) to generate a better local normal model  Abnormal models are exchanged across collaborative sites [Stolfo00]  Re-evaluate the locally computed sanitized models  Apply model differencing  Remove remote abnormal data from the Training Training local normal model Site X Site Y 25 Training phase

Conclusions  We propose a fully automated framework that allows the AD sensor to adapt to the characteristics of the protected host while maintaining high performance  We believe that our system can help alleviate some of the challenges faced as AD is increasingly relied upon as a first-class defense mechanism 26

Future Work  Combine the strengths of multiple sensors under a general and unified framework, following the directions traced out in this study  The temporal dimension of our online sanitization process can be complemented by a spatial one  Use feedback information for concept drift  The error responses returned by the system under protection 27

Thank you! Questions?

Adaptive Anomaly Detection via Self-Calibration and Dynamic - PowerPoint PPT Presentation

Adaptive Anomaly Detection via Self-Calibration and Dynamic Updating Gabriela F. Cretu-Ciocarlie Department of Computer Science Columbia University Joint work with Angelos Stavrou, Michael E. Locasto, Salvatore J. Stolfo Motivation and

What is an anomaly? Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Defining

Isolation trees Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Isolation

Anomaly Detection of Trajectories Junier B. Oliva Anomaly Detection An anomaly (or outlier)

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Learning Rules for Anomaly Detection (LERAD) of Hostile Network Traffic Matt Mahoney Overview

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Dataflow Anomaly Detection Presented By Archana Viswanath Computer Science and Engineering The

Structure of Talk Workload-sensitive Timing Behavior Anomaly Detection 1 Motivation in Large

<Title> Yiqun Hu, SP Group Agenda Condition monitoring & anomaly detection

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Autonomic Systems Autonomic Systems Autonomic : adaptive : adaptive Autonomic Self

In Incorporating Feedback in into Tree-based Anomaly Detection Shubhomoy Das, Weng-Keen Wong,

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Vacuum structure of 2 d adjoint QCD anomaly, mod 2 index, and semiclassics Yuya Tanizaki

Metadata format for benchmarking anomaly detection algorithms Youki Kadobayashi NICT / NAIST

A Tale of Two Anomalies: from LHCb to ANITA Speaker: Yicong Sui In collaboration with: Wolfgang

Fine-Grained Tracking of Grid Infections Ashish Gehani SRI Basim Baig, Salman Mahmood, Dawood

A Signal Analysis of Network Traffic Anomalies Paul Barford with Jeffery Kline, David Plonka,

from System Logs through Deep Learning Min Du , Feifei Li, Guineng Zheng, Vivek Srikumar

StrobeLight: Lightweight Availability Mapping and Anomaly Detection James Mickens, John Douceur,

Anomaly and gaugino mediation Supergravity mediation X is in the hidden sector, M P l

Adaptive Anomaly Detection via Self-Calibration and Dynamic - PowerPoint PPT Presentation

Adaptive Anomaly Detection via Self-Calibration and Dynamic Updating Gabriela F. Cretu-Ciocarlie Department of Computer Science Columbia University Joint work with Angelos Stavrou, Michael E. Locasto, Salvatore J. Stolfo Motivation and

What is an anomaly? Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Defining

Isolation trees Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Isolation

Anomaly Detection of Trajectories Junier B. Oliva Anomaly Detection An anomaly (or outlier)

Anomaly Detection Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Learning Rules for Anomaly Detection (LERAD) of Hostile Network Traffic Matt Mahoney Overview

Data Mining II Anomaly Detection Heiko Paulheim Anomaly Detection Also known as Outlier

Dataflow Anomaly Detection Presented By Archana Viswanath Computer Science and Engineering The

Structure of Talk Workload-sensitive Timing Behavior Anomaly Detection 1 Motivation in Large

&lt;Title&gt; Yiqun Hu, SP Group Agenda Condition monitoring &amp; anomaly detection

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Autonomic Systems Autonomic Systems Autonomic : adaptive : adaptive Autonomic Self

In Incorporating Feedback in into Tree-based Anomaly Detection Shubhomoy Das, Weng-Keen Wong,

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Vacuum structure of 2 d adjoint QCD anomaly, mod 2 index, and semiclassics Yuya Tanizaki

Metadata format for benchmarking anomaly detection algorithms Youki Kadobayashi NICT / NAIST

A Tale of Two Anomalies: from LHCb to ANITA Speaker: Yicong Sui In collaboration with: Wolfgang

Fine-Grained Tracking of Grid Infections Ashish Gehani SRI Basim Baig, Salman Mahmood, Dawood

A Signal Analysis of Network Traffic Anomalies Paul Barford with Jeffery Kline, David Plonka,

from System Logs through Deep Learning Min Du , Feifei Li, Guineng Zheng, Vivek Srikumar

StrobeLight: Lightweight Availability Mapping and Anomaly Detection James Mickens, John Douceur,

Anomaly and gaugino mediation Supergravity mediation X is in the hidden sector, M P l

<Title> Yiqun Hu, SP Group Agenda Condition monitoring & anomaly detection