Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs Charalampos Stylianopoulos Simon Kindström Magnus Almgren Olaf Landsiedel Marina Papatriantafilou Distributed Computing and Systems
Motivation ¢ IoT security is a concern ¢ Recent attacks: l Show that IoT security is lacking • Mirai botnet • Attacks on a casino’s aquarium thermostat l Underline the need for countermeasures 2
Motivation Standard security countermeasures (e.g. NIDS) can be applied l on the IoT devices themselves l on the entry point to the network of IoT devices 3
Motivation ¢ Challenges l Resource constrained devices l More connected devices -> More traffic to inspect ¢ NIDS l Performance bottleneck l Not tailored to hardware 4
Motivation: Pattern matching Pattern matching = The core functionality of NIDS Goal: Compare all network Input Stream … http://some.site.com/get.asp?f=/etc/passwd … GET HTTP try_backdoor… traffic against all malicious signatures … /etc/passwd Search for all patterns, admin.dll anywhere in the network get.asp stream. backdoor … more than 70% of Pattern set running time [1] … http://some.site.com/get.asp?f=/etc/passwd … GET HTTP try_backdoor… 5 [1] "Generating realistic workloads for network intrusion detection systems", Antonatos et al.
Motivation: New Devices ¢ Opportunities l IoT/Embedded hardware is evolving l New hardware features • Example: ODROID single board computers with embedded Graphic Processor Units (GPUs) Making use of those features is an open issue 6
Our work ¢ The questions we are trying to answer in this work: l Which algorithms to use? l What are the hardware characteristics that affect the performance? l How to create new algorithms that make best use of those characteristics? 8
Our work ¢ Co-evaluation of pattern matching algorithms l Evaluate existing implementations l Influence the design of new ones ¢ Target embedded GPUs l Deep look in their architectural features ¢ Extensive evaluation l Different datasets, patterns, l Energy efficiency 9
Outline ¢ Background l GPU computing ¢ Our Benchmark ¢ Evaluation 10
Background ¢ General Purpose GPU computing (GPGPU) l Other than graphics, GPUs can be used for general tasks as well l Highly parallel architecture ¢ Pattern matching on a GPU: Not a new thing l Not much work on embedded GPUs [1]"Gnort: High Performance Network Intrusion Detection Using Graphics Processors”, Vasiliadis et al., RAID 2008 [2]"APUNet: Revitalizing GPU as Packet Processing Accelerator”, Go et al, NSDI 2017 [3]"A highly-efficient memory-compression scheme for GPU-accelerated intrusion detection 11 systems”, Bellekens et al. SINCONF 2017
Background ¢ The platform Source :Energy efficient run-time mapping and thread partitioning of concurrent 12 OpenCL applications on CPU-GPU MPSoCs
Background Important characteristics (unique to embedded GPUS) ¢ Small number of cores/threads ¢ No main memory on the GPU Ø Shared main memory between CPU and GPU ¢ No local memory on chip ¢ Vectorization in each GPU thread ¢ Separate instruction counter per GPU thread Ø No need to worry about divergent execution 13
Outline ¢ Background ¢ Our Benchmark l Algorithms l Optimizations ¢ Evaluation 14
Algorithms Representative algorithms from two categories: Filtering based State machine based Aho Corasick CPU DFC GPU 15
¢ Aho Corasick Algorithms (CPU) ¢ DFC The Aho-Corasick algorithm ¢ Used in many Network Intrusion Detection Systems ¢ Builds a State Machine (SM) from all the patterns ¢ Traverses the SM reading the input byte by byte Benefits • Only one lookup per input byte • Poor cache locality Limitations • Data dependencies 16 “Efficient String Matching: An Aid to Bibliographic Search”, A. Aho, M, Corasick, ACM Comm.’75
¢ Aho Corasick Algorithms (CPU) ¢ DFC The DFC algorithm … a c t i v a t e ¢ Creates a filter from patterns a d m i n . d l l ¢ Quickly filter outs parts of b a c k d o o r the input g e t . a s p Pattern set … ... ba bb ... ab ac ad ... ge … 0 1 1 0 0 0 1 0 0 0 1 0 0 Filter (8 KB) Input Stream Fits in cache! … t h i s i s a n i n p u t 17 “DFC: Accelerating String Pattern Matching for Network Applications”, Choi et al. NSDI’16
¢ Aho Corasick Algorithms (CPU) ¢ DFC The DFC algorithm (continued) … … Initial%filter ¢ Progressive filtering 1B 223B 427B 82 B … … … … … … … … l in cache Pattern … … … … length specific … … filters … … ¢ Verification Hash% l in memory tables • Cache locality (on filtering) Benefits • No data dependencies Limitations • Verification phase is costly 18 “DFC: Accelerating String Pattern Matching for Network Applications”, Choi et al. NSDI’16
Algorithms Representative algorithms from two categories: Filtering based State machine based Aho Corasick CPU DFC GPU DFC (GPU) PFAC [1] HYBRID 19 [1] “Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs” Lin et al., TOC 2013
Hardware-oriented optimizations Relevant aspects that we investigate: ¢ Memory mapping vs data transfers l 2-5X faster with memory mapping ¢ Placement of the filters l Global memory l Texture memory l Local memory ¢ Vectorization l No significant speedup 20 More in the paper…
Outline ¢ Background ¢ Our Benchmark ¢ Evaluation 21
Evaluation Methodology CPU 4 ARM big.LITTLE GPU ARM Mali-T628 (6 shader cores) Hardware Memory 2GB RAM Sensors On board energy sensors l 3 publicly available traffic traces Datasets l 1 randomly generated data set l 2183 patterns (from Snort) Malicious l 5000 patterns ( emergingthreats.net) Patterns 22
Evaluation Methodology ¢ Goal of the evaluation: How fast we can process the input ( execution time ) 1. How much energy we spent for processing ( energy consumption ) 2. Effect of datasets and number of patterns 3. Influence the design of new algorithms 4. ¢ Versions: l Aho-Corasick CPU l DFC l PFAC l DFC on GPU (w/wo vectorization) GPU l HYBRID (w/wo vectorization) 23
Evaluation Results ¢ Experiment 1: execution time breakdown CPU->GPU CPU->GPU Post-processing Vect CPU Versions GPU Versions ( Post-processing = Output which and how many patterns matched, on the CPU ) 24
Evaluation Results ¢ Experiment 2: energy consumption 25
Evaluation Results ¢ Experiment 3: effect of datasets and #patterns 2183 patterns 5000 patterns 26
Evaluation Results ¢ Experiment 4: configuring Hybrid Slower access time (green trend, left y-axis) Bigger Filter = 27 Higher hit ratio -> Less verification (red trend, right y-axis)
Conclusions & Future Work ¢ Conclusions l New hardware features (embedded GPUs) can alleviate the bottleneck of pattern matching l Architecture characteristics important for high performance and low energy consumption l Possible to design new algorithms tailored to the hardware ¢ Future Work l Overlap CPU/GPU execution (heterogeneous design) l More algorithms and devices (e.g. Nvidia’s Jetson Nano) l Integrate with existing systems (e.g. Snort) ¢ Code available online 28
¢ Backup Slides 29
Background (1/3) ¢ Snort l The de-facto NIDS l Signature based (malicious signatures are known in advance) l The main pipeline looks like that more than 70% includes pattern of running time 30 matching
¢ Aho Corasick Algorithms (CPU) ¢ DFC The Aho-Corasick algorithm ¢ Used in many Network Intrusion Detection Systems ¢ Builds a State Machine (SM) from all the patterns ¢ Traverses the SM reading the input byte by byte Benefits • Only one lookup per input byte • Poor cache locality Limitations • Data dependencies “Efficient String Matching: An Aid to Bibliographic Search”, A. Aho, M, Corasick, ACM Comm.’75 31
Related work ¢ State machine based ¢ Filter based l Aho Corasick l DFC … a c t i v a t e a d m i n . d l l b a c k d o o r g e t . a s p Pattern set … ba bb ... ... ab ac ad ... ge … … … … 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 Filter (8 KB) Benefits Benefits • Cache locality (on filtering) • Only one lookup per input byte • No data dependencies Limitations • Poor cache locality Limitations • Much of the hardware remains underutilized • Data dependencies e.g. vector “Efficient String Matching: An Aid to Bibliographic "DFC: Accelerating String Pattern Matching for Network 32 instructions? Search”, A. Aho, M, Corasick, ACM Comm.’75 Applications”, Choi et al. NSDI’16
Recommend
More recommend