Unicorn Runtime Provenance-Based Detector for Advanced Persistent - PowerPoint PPT Presentation

Unicorn Runtime Provenance-Based Detector for Advanced Persistent Threats Thomas Pasquier Xueyuan Han, James Mickens University of Bristol Harvard University Adam Bates Margo Seltzer University of Illinois at Urbana-Champaign University of British Columbia 1

Advanced Persistent Threats Reconnaissance Ø Active Scanning Identify Target & Explore Vulnerabilities Ø Passive Scanning Weaponize Ø Malware Zero-Day Exploits Ø Scripting Design Backdoor & Penetration Plan Delivery Ø Spearphishing Ø Supply-chain Attack Deliver the Weapon Diverse Attack Vectors Exploitation Ø Application Shimming Ø Job Scheduling Victim Triggers Vulnerability Installation Ø Hooking Ø Dylib Hijacking Install Backdoor or Malware Command & Control Ø Connection Proxy Ø Domain Fronting Give Remote Instructions to Victim Actions on Objectives Long Duration Low-and-Slow Attack Patterns 2

Whole-System Data Provenance Low-and-Slow Attack Patterns File F We use whole-system data version provenance instead of traditional system call or log-adjacent system Process B File F File F event analysis. exec fork IP read Process A Process C Process C a.b.c.d version file read Full historical context of a system from a File G File W File W single, connected whole-system graph File H File P Process D Process C file write Causal relationships among system File K IP write subjects (e.g., process) and objects File W m.n.o.p File X 3

Previous Provenance-Based Approaches Single-hop graph exploration Process constrains contextual analysis IP Process B File F File F Exfiltration exec Rule fork Rule-based approaches IP read Process A Process C Process C a.b.c.d version require expert knowledge & susceptible to 0-day file read File G File W File W File H File P Process D Process C Snapshot static modeling lacks file write File K flexibility while runtime dynamic IP write model update is unsuitable for File W m.n.o.p File X low-and-slow attack patterns 4

Unicorn Goals We formalize system-wide intrusion detection problem in APT campaigns as a real-time, graph-based anomaly detection problem on large, attributed, streaming whole-system provenance graphs. Ø Continuously analyze provenance graph with space and time efficiency while leveraging its rich historical context and system-wide causality relationships Ø Consider the entire duration of system execution without making assumptions of attack behavior Ø Learn only normal system behavior changes but not those directed by the attackers 5

Unicorn Overview Execution Timeline 1 3 4 2 1. Takes as input a labeled, streaming provenance graph 2. Builds at runtime an in-memory graph histogram 3. Computes a fixed-size graph sketch periodically 4. Clusters sketches into a system model 6

Graph Histogram Iterative, vertex-centric, Weisfeiler-Lehman label update: 1 2 new_label = Hash(3, 1A2B) After R iterations: B histogram[new_label] += 1 A v Each vertex explored R-hop 5 3 neighborhood v Rich execution context v histogram contains entire graph 4 4 statistics Within the same iteration, every v Full historical context vertex is updated in parallel Efficient streaming variant: v Leverage partial ordering 1 6 In the next iteration, each vertex is guarantee from the provenance updated again, exploring larger capture system neighborhood: 8 7 new_label = Hash(7, 16) histogram[new_label] += 1 9 9 7

Discount Histogram for Concept Drift We model and monitor long-term system behavior, which often changes over time . Ø Such changes result in changes in the underlying statistical properties of the histogram. This phenomenon is called concept drift . Ø We use exponential weight decay to gradually forget outdated data. Ø Unicorn focuses on current system execution as well as elements that are causally related to current execution even if they are temporally distant . Ø Unicorn maintains fading “memory” of the past. Exponential decay: & % = + ,-△% ! " = $ & % 1 ( ) *" / (decay factor) controls the % rate of forgetting 8

Graph Sketch We want to measure based on the underlying distribution of graph Execution Timeline features, instead of absolute counts We employs HistoSketch : v Hash histograms to compact, Similarity-Preserving Data Sketching fixed-size sketch vectors v Approximate histograms based on normalized Jaccard similarity v Constant time algorithm to support real-time streaming v Sketch size |S| controls tradeoffs between information loss and computation efficiency In a streaming setting, # of histogram elements changes continuously 9

Evolutionary Model Execution Timeline Each cluster represents a ”meta- state” of system execution. We use those clusters and their statistics (e.g., diameter) to construct evolutionary model. v With evolutionary modeling, Unicorn learns system behavior at many points in time during a single training execution trace. v With gradually forgetting scheme, Unicorn focuses on the most relevant Clustering temporally- Periodic data sketching activities at each time point. ordered sketches based on during model building Jaccard similarity 10

Anomaly Detection Online model fitting Execution Timeline An evolutionary sub-model Runtime provenance graph Runtime graph sketching generated during training 11

Evaluation Datasets v StreamSpot dataset: We compare Unicorn against a state-of- the-art provenance-based anomaly detection system StreamSpot using its published dataset v Can Unicorn outperform StreamSpot? If so, what are the factors? v DARPA TC dataset: Data obtained during a red-team vs blue-team adversarial engagement with various provenance capture systems v Can Unicorn accurately detect anomalies in long-running systems? v Is the algorithm generalizable to different capture systems? v Simulated supply-chain ( SC ) attack dataset: Our own controlled dataset using CamFlow whole-system provenance capture system v How do Unicorn ’s different design decisions affect APT detection? 12

StreamSpot dataset Can Unicorn outperform StreamSpot? If so, what are the factors? Unicorn ’s larger neighborhood exploration ( R ) improves precision/recall and outperforms StreamSpot. StreamSpot creates snapshot-based static model and dynamically updates the model at runtime. v Results in a significant number of false alarms, creating an opportune time window for attackers v Persistent attackers can manipulate the model to gradually and slowly change system behavior to avoid detection v Unicorn ’s evolutionary model reduces false positives (see paper) and prevents model manipulation 13

TC dataset Can Unicorn accurately detect anomalies in long-running systems? Is the algorithm generalizable to different capture systems? v DARPA’S 2-week long third adversarial engagement with datasets collected from a network of hosts running different audit systems v Benign background activity generated from the red team allows us to model normal system behavior Unicorn ’s analytics framework generalizes to different capture systems and various graph High detection performance that accurately structures. detects anomalies in long-running systems without prior attack knowledge 14

SC attack dataset: Detection Performance How do Unicorn ’s different design decisions affect APT detection? We identify four important parameters that can affect detection performance: v Hop count ( R ): size of neighborhood exploration v Sketch size ( |S| ): size of fixed-size graph sketches v Interval of sketch generation: how often we construct new graph sketches as the provenance graph grows during system execution v Decay factor ( ! ): the rate at which we forget the past and focus on present execution 15

Unicorn Runtime Provenance-Based Detector for Advanced Persistent - PowerPoint PPT Presentation

Unicorn Runtime Provenance-Based Detector for Advanced Persistent Threats Thomas Pasquier Xueyuan Han, James Mickens University of Bristol Harvard University Adam Bates Margo Seltzer University of Illinois at Urbana-Champaign University of

How to Become a Unicorn WHAT IS A UNICORN? STARTUP UNICORN A privately held startup

Unicorn: Two-Factor Attestation for Data Security M. Mannan

THE UNICORN IS DEAD Soft Skills Trump Coding Skills Paul Sherman OBJECTIVES Deconstruct the UX

AGITG 2017 Implementation v Innovation What rules in my world? Managing Patient Expectations

Counselling, Psychotherapy & Training The Unicorn Centre, Unit 3 Hall Court, Bridge Street,

Programming Hybrid CPU-GPU Clusters with Unicorn Subodh Kumar IIT Delhi Part of Tarun Beris

Unicorn: Unified Resource Orchestration for Multi- Domain, Geo-Distributed Data Analytics Qiao

TRUST, AND PUBLIC ENTROPY: A UNICORN HUNT Arjen K. Lenstra and Benjamin Wesolowski 1 WHAT IS

UNICORN MINERAL RESOURCES LIMITED IRISH MIDLANDS OREFIELD COPPER / LEAD / ZINC and SILVER

Tim Karas UNICORN PRESIDENT Statewide Perspective Reflection Points: CCL is a strong statewide

The Unicorn Project And The Five Ideals Session ID: @RealGeneKim

IRISH MIDLANDS ZN / PB EXPLORATION PROJECTS 1 Forward Looking Statements Disclaimer Certain

Carcinoid Tumours and Carcinoid Syndrome What we know What we dont know Dr Anthony

UMR Company Presentation: January 2018 www.unicornmineralresources.com Disclaimer This

Operations at Twitter John Adams Twitter Operations John Adams / @netik Early Twitter

Cumberland Place sustainable long term returns 24 th November 2015 investment style

On the Shadow Simplex Method for Curved Polyhedra Daniel Dadush 1 ahnle 2 Nicolai H 1 Centrum

Paths and Networks Definition 1 (Graph). A graph consists of points (called vertices) and lines or

Pregel A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, et. al.

Preventing the First Cesarean Robyn Lamar, MD, MPH Assistant Professor of OB GYN, UCSF Outline

Efficient Delivery with Mobile Agents Andreas B artschi NSEC/CNLS, baertschi@lanl.gov CNLS

V E R T E X S I M I L A R I T Y A N D I T S A P P L I C A T I O N T O F U N C T I O N A L P

Pregel: A System for Large- Scale Graph Processing Written by G. Malewicz et al. at SIGMOD 2010

Pregelix: Think Like a Vertex, Scale Like Spandex Yingyi Bu (UC Irvine) Work with: Vinayak

Unicorn Runtime Provenance-Based Detector for Advanced Persistent - PowerPoint PPT Presentation

Unicorn Runtime Provenance-Based Detector for Advanced Persistent Threats Thomas Pasquier Xueyuan Han, James Mickens University of Bristol Harvard University Adam Bates Margo Seltzer University of Illinois at Urbana-Champaign University of

How to Become a Unicorn WHAT IS A UNICORN? STARTUP UNICORN A privately held startup

Unicorn: Two-Factor Attestation for Data Security M. Mannan

THE UNICORN IS DEAD Soft Skills Trump Coding Skills Paul Sherman OBJECTIVES Deconstruct the UX

AGITG 2017 Implementation v Innovation What rules in my world? Managing Patient Expectations

Counselling, Psychotherapy &amp; Training The Unicorn Centre, Unit 3 Hall Court, Bridge Street,

Programming Hybrid CPU-GPU Clusters with Unicorn Subodh Kumar IIT Delhi Part of Tarun Beris

Unicorn: Unified Resource Orchestration for Multi- Domain, Geo-Distributed Data Analytics Qiao

TRUST, AND PUBLIC ENTROPY: A UNICORN HUNT Arjen K. Lenstra and Benjamin Wesolowski 1 WHAT IS

UNICORN MINERAL RESOURCES LIMITED IRISH MIDLANDS OREFIELD COPPER / LEAD / ZINC and SILVER

Tim Karas UNICORN PRESIDENT Statewide Perspective Reflection Points: CCL is a strong statewide

The Unicorn Project And The Five Ideals Session ID: @RealGeneKim

IRISH MIDLANDS ZN / PB EXPLORATION PROJECTS 1 Forward Looking Statements Disclaimer Certain

Carcinoid Tumours and Carcinoid Syndrome What we know What we dont know Dr Anthony

UMR Company Presentation: January 2018 www.unicornmineralresources.com Disclaimer This

Operations at Twitter John Adams Twitter Operations John Adams / @netik Early Twitter

Cumberland Place sustainable long term returns 24 th November 2015 investment style

On the Shadow Simplex Method for Curved Polyhedra Daniel Dadush 1 ahnle 2 Nicolai H 1 Centrum

Paths and Networks Definition 1 (Graph). A graph consists of points (called vertices) and lines or

Pregel A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, et. al.

Preventing the First Cesarean Robyn Lamar, MD, MPH Assistant Professor of OB GYN, UCSF Outline

Efficient Delivery with Mobile Agents Andreas B artschi NSEC/CNLS, baertschi@lanl.gov CNLS

V E R T E X S I M I L A R I T Y A N D I T S A P P L I C A T I O N T O F U N C T I O N A L P

Pregel: A System for Large- Scale Graph Processing Written by G. Malewicz et al. at SIGMOD 2010

Pregelix: Think Like a Vertex, Scale Like Spandex Yingyi Bu (UC Irvine) Work with: Vinayak

Counselling, Psychotherapy & Training The Unicorn Centre, Unit 3 Hall Court, Bridge Street,