Detection Architecture Giorgos Vasiliadis, FORTH-ICS, Greece Michalis - PowerPoint PPT Presentation

MIDeA: A Multi-Parallel Intrusion Detection Architecture Giorgos Vasiliadis, FORTH-ICS, Greece Michalis Polychronakis, Columbia U., USA Sotiris Ioannidis, FORTH-ICS, Greece CCS 2011, 19 October 2011

Network Intrusion Detection Systems • Typically deployed at ingress/egress points – Inspect all network traffic – Look for suspicious activities – Alert on malicious actions 10 GbE Internal Internet Network NIDS gvasil@ics.forth.gr 2

Challenges • Traffic rates are increasing – 10 Gbit/s Ethernet speeds are common in metro/enterprise networks – Up to 40 Gbit/s at the core • Keep needing to perform more complex analysis at higher speeds – Deep packet inspection – Stateful analysis – 1000s of attack signatures gvasil@ics.forth.gr 3

Designing NIDS • Fast – Need to handle many Gbit/s – Scalable • Moore’s law does not hold anymore • Commodity hardware – Cheap – Easily programmable gvasil@ics.forth.gr 4

Today: fast or commodity • Fast “hardware” NIDS – FPGA/TCAM/ASIC based – Throughput: High • Commodity “software” NIDS – Processing by general-purpose processors – Throughput: Low gvasil@ics.forth.gr 5

MIDeA • A NIDS out of commodity components – Single-box implementation – Easy programmability – Low price Can we build a 10 Gbit/s NIDS with commodity hardware? gvasil@ics.forth.gr 6

Outline • Architecture • Implementation • Performance Evaluation • Conclusions gvasil@ics.forth.gr 7

Single-threaded performance Pattern NIC Preprocess Output matching • Vanilla Snort: 0.2 Gbit/s gvasil@ics.forth.gr 8

Problem #1: Scalability • Single-threaded NIDS have limited performance – Do not scale with the number of CPU cores gvasil@ics.forth.gr 9

Multi-threaded performance Pattern Preprocess Output matching Pattern NIC Preprocess Output matching Pattern Preprocess Output matching • Vanilla Snort: 0.2 Gbit/s • With multiple CPU-cores: 0.9 Gbit/s gvasil@ics.forth.gr 10

Problem #2: How to split traffic cores  Synchronization overheads NIC  Cache misses  Receive-Side Scaling (RSS) 11

Multi-queue performance Pattern Preprocess Output matching RSS Pattern Preprocess Output NIC matching Pattern Preprocess Output matching • Vanilla Snort: 0.2 Gbit/s • With multiple CPU-cores: 0.9 Gbit/s • With multiple Rx-queues: 1.1 Gbit/s 12

Problem #3: Pattern matching is the bottleneck > 75% Pattern NIC Preprocess Output matching  Offload pattern matching on the GPU Pattern NIC Preprocess Output matching gvasil@ics.forth.gr 13

Why GPU? • General-purpose computing – Flexible and programmable • Powerful and ubiquitous – Constant innovation • Data-parallel model – More transistors for data processing rather than data caching and flow control gvasil@ics.forth.gr 14

Offloading pattern matching to the GPU Pattern Preprocess Output matching RSS Pattern Preprocess Output NIC matching Pattern Preprocess Output matching • Vanilla Snort: 0.2 Gbit/s • With multiple CPU-cores: 0.9 Gbit/s • With multiple Rx-queues: 1.1 Gbit/s • With GPU: 5.2 Gbit/s 15

Multiple data transfers PCIe PCIe GPU CPU NIC • Several data transfers between different devices Are the data transfers worth the computational gains offered? gvasil@ics.forth.gr 17

Capturing packets from NIC Ring buffers User space Kernel space Rx Rx Rx Rx Network Interface Rx Queue Assigned • Packets are hashed in the NIC and distributed to different Rx-queues • Memory-mapped ring buffers for each Rx-queue gvasil@ics.forth.gr 18

CPU Processing • Packet capturing is performed by different CPU-cores in parallel – Process affinity • Each core normalizes and reassembles captured packets to streams – Remove ambiguities – Detect attacks that span multiple packets • Packets of the same connection always end up to the same core – No synchronization – Cache locality • Reassembled packet streams are then transferred to the GPU for pattern matching – How to access the GPU? gvasil@ics.forth.gr 19

Accessing the GPU • Solution #1: Master/Slave model Thread 2 Thread 1 PCIe Thread 3 GPU 64 Gbit/s Thread 4 • Execution flow example 14.6 Gbit/s P1 P1 Transfer to GPU: GPU execution: P1 P1 Transfer from GPU: P1 P1 gvasil@ics.forth.gr 20

Accessing the GPU • Solution #2: Shared execution by multiple threads Thread 1 Thread 2 PCIe GPU 64 Gbit/s Thread 3 Thread 4 • Execution flow example 48.1 Gbit/s Transfer to GPU: P1 P2 P3 P1 GPU execution: P1 P2 P3 P1 Transfer from GPU: P1 P2 P3 P1 gvasil@ics.forth.gr 21

Transferring to GPU Push CPU-core Push Scan Push GPU • Small transfer results to PCIe throughput degradation  Each core batches many reassembled packets into a single buffer gvasil@ics.forth.gr

Pattern Matching on GPU Packet Buffer GPU GPU GPU core core core GPU GPU GPU core core core Matches • Uniformly, one GPU core for each reassembled packet stream gvasil@ics.forth.gr 23

Pipelining CPU and GPU CPU Packet buffers • Double-buffering – Each CPU core collects new reassembled packets, while the GPUs process the previous batch – Effectively hides GPU communication costs gvasil@ics.forth.gr 24

Recap Data-parallel GPUs: content matching Reassembled packet streams Per-flow CPUs: protocol analysis Packet streams NIC: Demux 1-10Gbps Packets gvasil@ics.forth.gr 25

Setup: Hardware CPU-0 CPU-1 Memory Memory IOH IOH GPU GPU NIC • NUMA architecture, QuickPath Interconnect Model Specs 2 x CPU Intel E5520 2.27 GHz x 4 cores 2 x GPU NVIDIA GTX480 1.4 GHz x 480 cores 1 x NIC 82599EB 10 GbE gvasil@ics.forth.gr 27

Pattern Matching Performance Bounded by PCIe capacity GPU Throughput 48.1 42.5 26.7 14.6 1 2 4 8 #CPU-cores • The performance of a single GPU increases, as the number of CPU-cores increases gvasil@ics.forth.gr 28

Pattern Matching Performance 70.7 GPU Throughput 48.1 42.5 Adding a second GPU 26.7 14.6 1 2 4 8 #CPU-cores • The performance of a single GPU increases, as the number of CPU-cores increases gvasil@ics.forth.gr 29

Setup: Network 10 GbE Traffic MIDeA Generator/Replayer gvasil@ics.forth.gr 30

Synthetic traffic 7.2 MIDeA Gbit/s Snort (8x cores) 4.8 2.4 2.1 1.5 1.1 200b 800b 1500b Packet size • Randomly generated traffic gvasil@ics.forth.gr 31

Real traffic MIDeA 5.2 Gbit/s Snort (8x cores) 1.1 • 5.2 Gbit/s with zero packet-loss – Replayed trace captured at the gateway of a university campus gvasil@ics.forth.gr 32

Summary • MIDeA: A multi-parallel network intrusion detection architecture – Single-box implementation – Based on commodity hardware – Less than $1500 • Operate on 5.2 Gbit/s with zero packet loss – 70 Gbit/s pattern matching throughput gvasil@ics.forth.gr 33

Thank you! gvasil@ics.forth.gr gvasil@ics.forth.gr 34

Detection Architecture Giorgos Vasiliadis, FORTH-ICS, Greece Michalis - PowerPoint PPT Presentation

MIDeA: A Multi-Parallel Intrusion Detection Architecture Giorgos Vasiliadis, FORTH-ICS, Greece Michalis Polychronakis, Columbia U., USA Sotiris Ioannidis, FORTH-ICS, Greece CCS 2011, 19 October 2011 Network Intrusion Detection Systems

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Perimeter Intrusion Detection Mikro Tek Detection Technologies Ltd | +44 (0) 1773 744750 |

Collision Detection Collision detection weaknesses Naive collision detection suffers from 3 known

Local features: detection and description detection and description Kristen Grauman UT Austin

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Intrusion Detection Principles Basics Models of Intrusion Detection

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Architecture: Culture and Space Architecture: Culture and Space Architecture: Culture and Space

Pipeline leak detection eLearning Part 1 of 2 Please turn on your speakers Historical

Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoAOp.gif Jian Pei: CMPT

AutoML for Object Detection Xiangyu Zhang MEGVII Research 1 AutoML for Advances in AutoML

People-Tracking-by-Detection and People-Detection-by-Tracking Mykhaylo Andriluka Stefan Roth

Styles of Intrusion Detection Misuse intrusion detection Try to detect things known to be

Collision Detection That Collision Detection That Collision Detection That Really Works Really

Introduction to fraud detection Charlotte Werger Data Scientist DataCamp Fraud Detection in

GoldenEye: stream-based network packet inspection using GPUs Qian Gong, Wenji Wu, Phil DeMar The

t rt sr P

Opportunistic IPv6 Insight via Abusive Traffic Robert Beverly, Geoffrey Xie Naval Postgraduate

S-NFV: Securing NFV states by using SGX Ming-Wei Shih Mohan Kumar Taesoo Kim Ada

Detecting Deception in the Context of Web 2.0. Annarita Giani , EECS, University of California,

PIN-point control for analyzing malware Jason Jones REcon 2014 1 Me Sr

Understanding Linux Malware Emanuele Cozzi 1 , Mariano Graziano 2 , Yanick Fratantonio 1 , Davide

C ONTEXT -K EYED P AYLOAD E NCODING : F IGHTING THE N EXT G ENERATION OF IDS Dimitrios A. Glynos