accelerating cyber threat detection with gpu
play

ACCELERATING CYBER THREAT DETECTION WITH GPU Joshua Patterson | - PowerPoint PPT Presentation

ACCELERATING CYBER THREAT DETECTION WITH GPU Joshua Patterson | Director of Applied Solutions Engineering | GTC Israel 2017 @datametrician RULES & PEOPLE DONT SCALE Current methods are too slow Right now, financial services reports it


  1. ACCELERATING CYBER THREAT DETECTION WITH GPU Joshua Patterson | Director of Applied Solutions Engineering | GTC Israel 2017 @datametrician

  2. RULES & PEOPLE DON’T SCALE Current methods are too slow Right now, financial services reports it takes an average of 98 days to detect an Advance Threat but retailers say it can be about seven months . Once the security community moves beyond the mantras “encrypt everything” and “secure the perimeter,” it can begin developing intelligent prioritization and response plans to various kinds of breaches – with a strong focus on integrity. The challenge lies in efficiently scaling these technologies for practical deployment , and making them reliable for large networks . This is where the security community should focus its efforts. 2 http://www.wired.com/2015/12/the-cia-secret-to-cybersecurity-that-no-one-seems-to-get/

  3. ATTACKS ARE MORE SOPHISTICATED How Hackers Hijacked a Bank’s Entire Online Operation https://www.wired.com/2017/04/hackers-hijacked-banks-entire-online-operation/ 3

  4. FIRST PRINCIPLES OF CYBER SECURITY Where the industry must go 1. Indication of compromise needs to improve as attacks are becoming more sophisticated, subtle, and hidden in the massive volume and velocity of data. Combining machine learning, graph analysis, and applied statistics, and integrating these methods with deep learning is essential to reduce false positives, detect threats faster, and empower analyst to be more efficient. 2. Event management is an accelerated analytics problem, the volume and velocity of data from devices requires a new approach that combines all data sources to allow for more intelligent/advanced threat hunting and exploration at scale across machine data. 3. Visualization will be a key part of daily operations, which will allows analyst to label and train Deep Learning models faster, and validate machine learning prediciton. 4

  5. FIRST PRINCIPLES OF CYBER SECURITY Where the industry must go 1. Indication of compromise needs to improve as attacks are becoming more sophisticated, subtle, and hidden in the massive volume and velocity of data. Combining machine learning, graph analysis, and applied statistics, and integrating these methods with deep learning is essential to reduce false positives, detect threats faster, and empower analyst to be more efficient. 5

  6. 6

  7. DATA PLATFORM-AS-A-SERVICE HIGH AVAILABILITY SCALE Offers HA with no data-loss Handles 1M events/second • • • Always-on architecture • Auto-scales the cluster • Data replication automatically SELF SERVICE SECURITY Log-to-analytics Data platform security has • • • Kibana, JDBC access been implemented with • Accessing data using BI tools VPCs in AWS • Dashboard access using NVIDIA LDAP 7

  8. ARCHITECTURE V1 8

  9. DATA PLATFORM STATS 9

  10. ANOMALY DETECTION 10

  11. ANOMALY DETECTION USING DEEP LEARNING NGC/NGN GPU NGC/NGN Cluster GPU Cluster GPU Cloud Anomaly Detection Data Platform AD AI Framework Top Features (Keras + TensorFlow) Automated Alerts & Dashboards Early Detection Self Service Better accuracy & less noise 11

  12. ANOMALY DETECTION FRAMEWORK Anomalies: Email alerts, Dashboards Feedback from user Time X1 X2 Y Anomaly Anomaly Post- Description processing: Univariate 1 X1 Analysis 0 Anomaly Detection Time X1 X2 X’ X’’ Y Supervised Learning: Unsupervised Learning: 1 Logistic Regression Multivariate-Gaussian 0 Feature Learning Time X1 X2 X’ X’’ Algorithm: Recurrent Neural Network (RNN), Autoencoders (AE) Time X1 X2 Raw Dataset 12

  13. ANOMALY DETECTION BENEFITS WITH DEEP LEARNING Top Features Automated Alerts & Dashboards Early Detection Self Service Better accuracy & less noise 13

  14. ANOMALY DETECTION TRAINING Evolution • V1 : V2 : Multi- V0 : Automatic GPU support • CPU vs GPU Manual Feature + TensorFlow Feature Creation Serving using DL (Keras + Creation • Learnings : (Theano) TensorFlow) Manual feature extraction does not scale • Dataset preparation is the long pole • Training on CPU takes longer than data collection rate • 14

  15. INFERENCING V1 Use Case: Detecting anomalies with user’s activity • Python Script Performance 200 • Inferencing flow from 10k feet 150 154 Live ETL 100 Streaming aggregations AD Platform 103 Data for inferencing 50 73 0 10 MINS 30 MINS 60 MINS Started with python scripts for windowed aggregation • • Learnings: Hard to scale for near real time. AD platform runs inferencing every 3 mins as we are impacted by speed of data processing 15

  16. INFERENCING V2 Improved Performance 35 V2: To improve performance, we started using Presto • with data on S3 in JSON format 30 30 25 Live data will be streamed from Kafka to S3. We use • 25 20 Presto for our data warehousing needs 20 15 • Presto is an open-source distributed SQL query engine 10 optimized for low-latency, ad-hoc analysis of data * 8 5 6 4 0 PRESTO ON JSON PRESTO ON PARQUET 10 mins 30 mins 60 mins • Learnings: Presto with Parquet has best performance but we need to batch data at 30 secs interval. So it’s not completely real time 16

  17. FIRST PRINCIPLES OF CYBER SECURITY Where the industry must go 1. Indication of compromise needs to improve as attacks are becoming more sophisticated, subtle, and hidden in the massive volume and velocity of data. Combining machine learning, graph analysis, and applied statistics, and integrating these methods with deep learning is essential to reduce false positives, detect threats faster, and empower analyst to be more efficient. 2. Event management is an accelerated analytics problem, the volume and velocity of data from devices requires a new approach that combines all data sources to allow for more intelligent/advanced threat hunting and exploration at scale across machine data. 17

  18. GPU ACCELERATION Accelerate the Pipeline, Not Just Deep Learning GPUs for deep learning = proven • Data Ingestion • Where else and how else can we use GPU acceleration? • Dashboards Data Processing • Accelerating data pipeline Stream processing • Visualization Inferencing Model Training • Building better models faster • First: GPU databases 18

  19. MOVING TO BIG DATA IS A START Spark outperforms traditional SIEM SIEM vs Big Data Solution 10 node cluster - ~$60k in hardware Production SIEM of Fortune 500 Enterprise Data 450+ columns ~250 million events per day Spark vs SIEM Benchmarks from Accenture Labs - Strata NY, Bsides LV 19

  20. MOVING TO BIG DATA IS A START Spark outperforms traditional SIEM Typical Scenario Time Period SIEM Big Data Speed Up 1 Show all network communication from one host 1 Day 3h 20m 13s 1m 44s 114 Times Faster (IP) to multiple hosts (IPs) 1 Week Not Feasible* 4m 05s 2 Retrieve failed logon attempts in Active 1 Day 18m 26s 1m 37s 10 Times Faster Directory 1 Week 2h 13m 45s 3m 10s 41 Times Faster 3 Search for Malware (exe) in Symantec logs 1 Day 3h 24m 36s 1m 37s 125 Times Faster 1 Week Not Feasible* 3m 22s 4 View all proxy logs for a for specific domain 1 Day 4h 30m 13s 2m 54s 92 Times Faster 1 Week Not Feasible* 1m 09s** Spark vs SIEM Benchmarks from Accenture Labs - Strata NY, Bsides LV 20

  21. GPU DATABASES ARE EVEN FASTER 1.1 Billion Taxi Ride Benchmarks Query 1 Query 2 Query 3 Query 4 10190 8134 19624 85942 5000 4500 4000 3500 2970 3000 2500 2250 Time in Milliseconds 2000 1560 1500 1250 1000 696 372 500 269 150 99 80 30 21 0 MapD DGX-1 MapD 4 x P100 Redshift 6-node Spark 11-node Source: MapD Benchmarks on DGX from internal NVIDIA testing following guidelines of @marklit82 Mark Litwintschik’s blogs: Redshift, 6-node ds2.8xlarge cluster & Spark 2.1, 11 x m3.xlarge cluster w/ HDFS 21

  22. MAPD MapD Core MapD Immerse Backend Rendering LLVM Streaming LLVM creates one custom function that Data goes from compute (CUDA) to Speed eliminates need to pre-index or runs at speeds approaching hand-written graphics (OpenGL) pipeline without copy aggregate data. Compute resides on functions. LLVM enables generic and comes back as compressed PNG GPUs freeing CPUs to parse + ingest. targeting of different architectures + run (~100 KB) rather than raw data (> 1GB). Finally, newest data can be combined with simultaneously on CPU/GPU. billions of rows of “near historical” data. 22

  23. MAPD ARCHITECTURE Open Source Commercial High Availability Visualization Libraries LLVM Distributed Scale-out MapD Core has native MapD Core has high MapD Core SQL queries are JavaScript libraries that allow distributed scale-out availability functionality that compiled with a just-in-time users to build custom web- capabilities. MapD Core users provides durability and can query and visualize larger redundancy. Ingest and (JIT) LLVM based compiler, based visualization apps datasets with much smaller and run as NVIDIA GPU powered by a MapD Core queries are load balanced cluster sizes than traditional machine code. database based on DC.js. across servers for additional solutions. throughput. 23

Recommend


More recommend