Provenance-based Intrusion Detection Thomas Pasquier University of - PowerPoint PPT Presentation

Provenance-based Intrusion Detection Thomas Pasquier University of Bristol https://tfjmp.org 12/11/2020 1

Talk loosely based on following publications ● Han et al. “ SIGL: Securing Software Installations Through Deep Graph Learning” , USENIX Security 2021 ● Han et al. “UNICORN: Revisiting Host-Based Intrusion Detection in the Age of Data Provenance” , NDSS 2020 ● Pasquier et al. “Runtime Analysis of Whole-System Provenance” , ACM CCS 2018 ● Pasquier et al. “Practical Whole-System Provenance Capture” , ACM SoCC 2017 2

Motivation: System call based intrusion detection System Calls 3

Motivation: System call based intrusion detection System Calls Identify abnormal patterns 4

Motivation: System call based intrusion detection System Calls Identify abnormal patterns Hidden among benign actions 5

Motivation: System call based intrusion detection System Calls Identify abnormal patterns Hidden among benign actions Masquerading as benign action 6

Motivation: System call based intrusion detection System Calls [...] Identify abnormal patterns Hidden among benign actions Masquerading as benign action [...] Over a long period of time 7

What is provenance? 8

What is provenance? - From the French “provenir” meaning “coming from” - Formal set of documents describing the origin of an art piece - Sequence of - Formal ownership - Custody - Places of storage - Used for authentication 9

What is data-provenance? - Represent interactions between objects of different types - Data-items ( entities ) - Processing ( activities ) - Individuals and Organisations ( agents ) - Represented as a directed acyclic graph (think information flows) - Edges represent interactions between objects’ states as dependencies - It is a representation of history of a system execution - Immutable (unless it’s 1984) - No dependency to the future 10

How is this useful? 11

Provenance-based intrusion detection ▪ Intuition : provenance graph exposes causality relationships between events 12

Provenance-based intrusion detection ▪ Intuition : provenance graph exposes causality relationships between events 13

Provenance-based intrusion detection Related events are connected even across long period of time ▪ 14

How to perform detection? 15

Assumptions (and limitations) Runtime detection - We target environment with minimal human intervention - - relatively consistent behaviour - e.g. web servers, CI pipelines etc... Build a model of system behaviour (unsupervised training) - - in a controlled environment - from a representative workload (this is hard!) Detect deviation from the model - Several approaches being explored… - 16

Example: UNICORN ▪ Han et al. “UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats” , NDSS 2020 17

Example: UNICORN Graph streamed in, converted to histogram, labelled using (modified) 1) struct2vec 18

Example: UNICORN 2) At regular interval, histogram converted to a fixed size vector using similarity preserving graph sketching 19

Example: UNICORN 3) Feature vectors are clustered 20

Example: UNICORN 4) Cluster forms “ meta-state ”, transitions are modelled In deployment, anomaly detected via clustering and “meta-state” model 21

Relatively simple Labelled directed acyclic graph ▪ – node/edge types – security context (when available) Modification and combination of existing algorithms ▪ – struct2vec – similarity preserving hashing – clustering Right combination + domain knowledge ▪ 22

Some insights from this work 23

We can build practical provenance-based IDSs We can detect intrusion out of graph structure with little metadata ▪ – Vertex type (thread, file, socket etc…) – Edge type (read, write, connect etc…) Processing speed ▪ – Current prototype – Data generation speed < processing speed! 24

Proper evaluation is hard! - Dataset are hard to generate - What is a good quality dataset? - Hard to compare across papers, a lot is not available - Experiments (i.e. attacks) - Capture Mechanisms - Analysis pipelines - Leads to unsatisfactory evaluation - I may be able to compare to similar techniques (may reuse dataset) - … very hard for unrelated one (i.e. ingest different data type) - Adversarial ML? 25

Identifying threats: explainability is a problem There is a problem within the last batch of X graph elements ▪ – 2,000 in previous figures Good luck finding out what went wrong ▪ Provenance forensic is an active field of research ▪ – Promising work out of the DARPA programme … but could we do better during detection? ▪ 26

Ongoing projects 27

Towards more interpretable provenance-based IDSs ● PhD student project ( Xueyuan “Michael” Ha n) ● Collaborators ○ Harvard University ○ UBC ○ NEC Labs America ● Deep graph learning techniques ● Precisely identifying attacks within a provenance-graph ● Generating actionable reports 28

A framework for Provenance-based forensics ● PhD student project ( Priyanka Badva ) ● Collaborators ○ SRI International ● Provenance graphs are large and complex (several millions nodes) ● Designing tools and techniques to identify/explain attacks ● Working with my colleague Ryan 29

Distributed IDS - Edge network - Collaboration with Toshiba (£4M) - Exploring distributed learning - Poisoning - Mechanism - Etc. - Large testbed planned (work starting January) - Hiring 2 postdocs at Bristol - Money available for an intern short term (+-covid) 30

Kernel partitioning ● PhD student project ( Soo Yee Lim ) ● Collaborators ○ HP Labs Bristol ○ Royal Holloway, University of London ○ University of Otago ● Leveraging CHERI/ARM Morello hardware ○ Hardware capabilities ● Implement kernel partitioning in the Linux OS 31

Thank you! Questions? https://tfjmp.org thomas.pasquier@bristol.ac.uk 32

How to evaluate? 33

Comparison state of the art Manzoor et al. " Fast memory-efficient anomaly detection in streaming heterogeneous graphs " ACM KDD, 2016. R -> neighborhood size for struct2vec algorithm 34

Evaluation with DARPA datasets 35

Evaluation with DARPA datasets SUCH GOOD RESULTS ARE NOT NORMAL 36

Building our own dataset ▪ Attack designed to look similar to background activity 37

Building our own dataset ▪ Attack designed to look similar to background activity ▪ Is that enough? 38

Runtime performance 39

Runtime performance 40

Runtime performance Memory usage: ~500MB CPU usage 15% on 1 core 41

Provenance-based Intrusion Detection Thomas Pasquier University of - PowerPoint PPT Presentation

Provenance-based Intrusion Detection Thomas Pasquier University of Bristol https://tfjmp.org 12/11/2020 1 Talk loosely based on following publications Han et al. SIGL: Securing Software Installations Through Deep Graph Learning ,

Styles of Intrusion Detection Misuse intrusion detection Try to detect things known to be

IT INTRUSION IT INTRUSION FinFisher Product Suite IT INTRUSION IT INTRUSION FinFisher

Building a provenance-based intrusion detection system Thomas Pasquier, University of Bristol

Intrusion Detection Principles Basics Models of Intrusion Detection

Outline Introduction Intrusion Detection Characteristics of intrusion detection CS 236

Outline Introduction Intrusion Detection Characteristics of intrusion detection CS 239

Provenance for Interactive Visualizations Fotis Psallidas Eugene Wu fotis@cs.columbia.edu

Provenance Tracking in CXXR Chris A. Silles Andrew R. Runnalls Computing Laboratory, University

PASS PASS Provenance-Aware Storage System Provenance-Aware Storage System Margo Seltzer, David

Scalable Uncertainty Management 03 Provenance Rainer Gemulla May 18, 2012 Overview In this

Intrusion Detection System Amir Hossein Payberah payberah@yahoo.com 1 Contents Intrusion

Intrusion Detection Distributed Host-Based Network-Based ITS335: IT Security Honeypots

Intrusion Detection W enke Lee Com puter Science Departm ent Colum bia University Intrusion and

Provenance of astronomical data The IVOA Provenance Working Group: Catherine Boisson Franois

Provenance from the data provider view constructing provenance information for the APPLAUSE

Network Intrusion Detection & Forensics with Bro Matthias Vallentin vallentin@berkeley.edu

Chapter 13 13.1 Detecting Extrasolar Planets Other Planetary Systems The New Science of Distant

Performance Optimization Project 2 Lab Schedule Activities Assignments Due Today

1 Lab 5::Pintos file-structures Lab 5::Reading/Writing (1) Lab 5::Reading/Writing (2) Several

2009 IC/CAD Contest Problem B3 : Obstacle-Avoiding Rectilinear Clock Routing with Preferred

CLIC Crab Cavity and Wakefields Praveen Ambattu CLIC crab group Cockcroft Institute / Lancaster

Sources of Field Perturbations LLRF Lecture Part2 S. Simrock, M. Grecki ITER / DESY RF System

An experimental security analysis of an Industrial Robot Controller Davide Quarta , Marcello

Outline Introduction Paper: Paper: Antenna Designs Antenna Performance Analysis:

Provenance-based Intrusion Detection Thomas Pasquier University of - PowerPoint PPT Presentation

Provenance-based Intrusion Detection Thomas Pasquier University of Bristol https://tfjmp.org 12/11/2020 1 Talk loosely based on following publications Han et al. SIGL: Securing Software Installations Through Deep Graph Learning ,

Styles of Intrusion Detection Misuse intrusion detection Try to detect things known to be

IT INTRUSION IT INTRUSION FinFisher Product Suite IT INTRUSION IT INTRUSION FinFisher

Building a provenance-based intrusion detection system Thomas Pasquier, University of Bristol

Intrusion Detection Principles Basics Models of Intrusion Detection

Outline Introduction Intrusion Detection Characteristics of intrusion detection CS 236

Outline Introduction Intrusion Detection Characteristics of intrusion detection CS 239

Provenance for Interactive Visualizations Fotis Psallidas Eugene Wu fotis@cs.columbia.edu

Provenance Tracking in CXXR Chris A. Silles Andrew R. Runnalls Computing Laboratory, University

PASS PASS Provenance-Aware Storage System Provenance-Aware Storage System Margo Seltzer, David

Scalable Uncertainty Management 03 Provenance Rainer Gemulla May 18, 2012 Overview In this

Intrusion Detection System Amir Hossein Payberah payberah@yahoo.com 1 Contents Intrusion

Intrusion Detection Distributed Host-Based Network-Based ITS335: IT Security Honeypots

Intrusion Detection W enke Lee Com puter Science Departm ent Colum bia University Intrusion and

Provenance of astronomical data The IVOA Provenance Working Group: Catherine Boisson Franois

Provenance from the data provider view constructing provenance information for the APPLAUSE

Network Intrusion Detection &amp; Forensics with Bro Matthias Vallentin vallentin@berkeley.edu

Chapter 13 13.1 Detecting Extrasolar Planets Other Planetary Systems The New Science of Distant

Performance Optimization Project 2 Lab Schedule Activities Assignments Due Today

1 Lab 5::Pintos file-structures Lab 5::Reading/Writing (1) Lab 5::Reading/Writing (2) Several

2009 IC/CAD Contest Problem B3 : Obstacle-Avoiding Rectilinear Clock Routing with Preferred

CLIC Crab Cavity and Wakefields Praveen Ambattu CLIC crab group Cockcroft Institute / Lancaster

Sources of Field Perturbations LLRF Lecture Part2 S. Simrock, M. Grecki ITER / DESY RF System

An experimental security analysis of an Industrial Robot Controller Davide Quarta , Marcello

Outline Introduction Paper: Paper: Antenna Designs Antenna Performance Analysis:

Network Intrusion Detection & Forensics with Bro Matthias Vallentin vallentin@berkeley.edu