Distributed Summary Statistics with Bro Vlad Grigorescu 1

> whoami • Member of the Bro development team • Senior Developer at Broala LLC • Senior Information Security Engineer at Carnegie Mellon University https://github.com/grigorescu @0f010d 2

Goal To develop statistics that can efficiently summarize network activity distributed over a large number of sensors, while minimizing memory usage. 3

Outline 1. Observation examples 2. What types of questions can we answer? 3. SumStats Framework 1. Overview 2. Available Reducers 4. Real-world usage 4

Observation Examples • 192.168.2.13 received an NXDOMAIN reply for a DNS A query of: host.244.ipoe2.subnets.khb.ttkdv.ru 5

Observation Examples • 192.168.2.14 received a 403 Forbidden when performing a POST to: http://sqm.microsoft.com/sqm/ Windows/sqmserver.dll 6

Observation Examples • 192.168.2.15 sent an e-mail with an application/x-dosexec attachment, with MD5 hash c84a46850de0a29483ed1f7a0b9897ab 7

What types of questions can we answer? • Which source/dest IP pairs have the lowest variance in TCP session byte counts? • Which ASNs have the highest number of connections into your network? • Which IP source has connected to the highest number of unique destinations? 8

What types of questions can we answer? • In the past 24 hours, which clients have sent the most failed DNS queries? • Which servers have received the most failed DNS queries? • If we look at each IP’s ratio of failed to total DNS queries, which IPs have had over 90% failures? 9

SumStats Framework • A set of Bro scripts for generating summary statistics • Tie into the existing Bro scripts to make observations about events in layers 2-7 • Can threshold values to create notices, which can prompt automated responses • Can query the current values for more advanced use-cases scripts 10

SumStats Framework: Philosphy All summary statistics must be: • Highly memory efficient, • Streaming (the data is only seen once), • Mergable (distributable across thousands of nodes, each of which see a subset of the total traffic) 11

SumStats Framework: Design Observation! Notice! Observation! Observation! Observation! Observation! Observation! Observation! Observation! Reducer Observation! SumStat Observation! Observation! Observation! Observation! Observation! Observation! Observation! Observation! Observation! Observation! Observation! 12

SumStats Framework: Design Observation! Notice! Observation! Observation! Observation! Observation! Observation! Observation! Observation! Reducer Observation! SumStat Observation! Observation! Observation! Observation! Observation! Observation! Observation! Observation! Observation! Observation! Observation! 13

SumStats Framework: Design Reducer Reducer Observation! Reducer 14

SumStats Framework: Reducers “Classic” Stats: “Memory Efficient” Stats: • Average • HyperLogLog • Min • Top-k • Max • Reservoir Sampling • Last • Sum • Std Dev • Variance • Cardinality 15

Reducers: HyperLogLog • Streaming algorithm for calculating cardinality of huge datasets • Can calculate cardinality of 1 billion elements with a relative accuracy of 2% using 1.5 KB of memory • Mergeable without any loss in accuracy 16

Reducers: HyperLogLog Which IP source has connected to the highest number of unique destinations? Let’s assume that you have a fully populated /8 network (16.5M hosts). We want to know the cardinality of destinations for each host. 16.5M ₒ 1.5 KB ≈ 24 GB of RAM 17

Reducers: Top-k • Streaming algorithm for finding the most frequent elements in a dataset, in a space-saving way • Implementation of: Metwally A, Agrawal D, El Abbadi A (2005) Efficient computation of frequent and top-k elements in data streams. 18

Reducers: Top-k Which IP source has connected to the highest number of unique destinations? Connect our HyperLogLog reducer to a Top-k reducer. Still assuming /8 network and 2% error; top talker connected to 1000 destinations ≈ 6 GB of RAM. 19

Real-World Usage: Writing a SumStat Script Which source/dest IP pairs have the lowest variance in TCP session byte counts? 20

Real-World Usage: Writing a SumStat Script 1. Observation: event connection_state_remove(c: connection) { SumStats::observe("end_of_conn", [$key=cat(c$id$orig_h,c$id$resp_h)], [$num=c$orig$size+c$resp$size]); } 21

Real-World Usage: Writing a SumStat Script 2. Reducers: local r1 = SumStats::Reducer( $stream="end_of_conn", $apply=set(SumStats::VARIANCE, SumStats::SUM) ); 22

Real-World Usage: Writing a SumStat Script 3. SumStat: SumStats::create( [$name="variance_of_orig_bytes", $epoch=5min, $reducers=set(r1), $threshold_val=(1-variance), #See note $threshold=0.9, $threshold_crossed=doNotice()#See note ]); Note: Slightly simplified for brevity where commented. 23

Real-World Usage: scan.bro Tracks the number of failed connection attempts (“port scans”) by source IP. Generates a notice when: • A source scans over 25 unique IPs on the same port within 5 minutes, or • A source scans over 25 unique ports on the same destination IP within 5 minutes. 24

Real-World Usage: scan.bro • Carnegie Mellon sees approximately 3000-6000 failed connection attempts per second • scan.bro uses approx. 150 MB of RAM and has detected 49,500 scans from July-November 2013 25

Ongoing Work • Writing more SumStats scripts to detect: • DNS amplification attacks • Beaconing • Behavioral changes 26

Distributed Summary Statistics with Bro Vlad Grigorescu 1 > - PowerPoint PPT Presentation

Distributed Summary Statistics with Bro Vlad Grigorescu 1 > whoami Member of the Bro development team Senior Developer at Broala LLC Senior Information Security Engineer at Carnegie Mellon University

Bro stuff Justin Azoff Aug 4, 2015 try.bro.org on github Figure : try.bro on github Bro

Bro Clusters Bro Workshop 2011 NCSA, Urbana-Champaign, IL Bro Workshop 2011 Thursday, November

Broverview Bro Workshop 2011 NCSA, Urbana-Champaign, IL Bro Workshop 2011 Outline 2 Bro

BroCon 2017 Introduction & Welcome Adam Slagell Supporting the Project Create Bro

Bro Scripts The Bro Monitoring Platform Agenda Thursday Block 1: Bro-Overview and introduction.

A Bro Script Case Study Bro Workshop 2011 NCSA, Urbana-Champaign, IL Bro Workshop 2011 No

Broadmap Bro Workshop 2011 NCSA, Urbana-Champaign, IL Bro Workshop 2011 Outline Near- to

The Bro Network Security Monitor Bro Integrations: Some Misc. Bro Related Stuff Jon Schipp, NCSA

Zeek 3.0.0 and beyond Robin Sommer robin@corelight.com Just released: Zeek 3.0.0 bro ->

The Bro Network Security Monitor Network Forensics with Bro Matthias Vallentin UC Berkeley /

N&H NYMC Information evening February 27 th Introductions APGM designate W Bro Mark

The Bro Package Manager and You Seth Hall Chief Evangelist Corelight, Inc About Me Bro at all

BRO BEFRIENDS SURICATA SURICATA AND BRO FIGHTING MALWARE TOGETHER Created by Michal Purzynski

Bro Introduction Educause SPC Seth Hall International Computer Science Institute Justin Azoff

The Bro Monitoring Platform Adam Slagell National Center for Supercomputing Applications Borrowed

The Bro Network Security Monitor Robin Sommer International Computer Science Institute, &

Three Pillars with Zero Answers A New Observability Scorecard November 5, 2018 First, a Critique

Complexity of Model Checking for Cardinality-based Belief Revision Operators Nadia Creignou 1

Group 22 Fernando Bilbao - CpE Harold Grafe - EE Neysha Irizarry-Cardoza - CpE Motivation

Finding Government Procurement Opportunities for the new economy Presented by: M. Clyde

NJASK Parent Presentation March 26, 2014 B^2 = Beth Benjamin Mathematics Coach MacAfee and Conerly

What Remains of Dashboards and Metrics without the Hype and the Anti-Patterns Bjrn

Traces Are the Fuel, Not the Car Making Distributed Tracing Valuable March 4, 2019 Ben Sigelman,

Envisions Math Presented By: Jill Schwantes, Kim Labbree, and Jamie Tenerelli Topic 8: More