FlonCon 2013 | January 7 10 | Albuquerque, New Mexico Introductions - PowerPoint PPT Presentation

John Munro / jmunro@endgame.com Jason Trost / jtrost@endgame.com FlonCon 2013 | January 7 – 10 | Albuquerque, New Mexico

Introductions • John Munro (jmunro@endgame.com) – Network Security Researcher and Data Scientist • Jason Trost (jtrost@endgame.com) – Senior Software Engineer – Specializes in Hadoop/Storm/BigData

Agenda • The Problem • Our Approach • DGA Domain Classifier • String Statistics as Features • Malicious Domain Classifier • Demo • Real-time Streaming Platform

The Problem txmxbo.info youtube.com yahoo.com Ct0u2xj5dbe4.wvw — game465.com p4.httzd5e2ufizo.3bawhfuec45dca65.401724.s1.v4.ipv6-exp.l.google.com abulqe.com za6.limfoklubs.com ns3.ohio.gov bibz01.apple.com docs.joomla.org Wmk41035u3751s0bgv4n91b0b7h74v.ipcheker.com

The Problem • Massive Volumes – Some of our partners deal with TBs per day of DNS PCAPs • Incredible Rates – One partner sees 13k requests/sec – Another closer to 100k/sec

Our Approach: Machine Learning! • Real-time streaming classification – In parallel across multiple servers • Markov Models – Random Domain Generation Traffic – Normal Benign Traffic • Random Forests – Benign vs Malicious • Periodically retrained – In order to maintain accuracy

Data Sources • Benign Domains – Millions of popular, real domains • Correlated with the Alexa top 10k domains • Malicious Domains – 800k domains gathered from an internal malware sandbox – Public blacklist domains from Conficker and Murofet Botnets

Markov Models

Markovian DGA Classifier • Domain Generation Algorithm (DGA) • Popular Domain Model – Trained: 258,039 domains from Day 1 of our Benign set – Tested: 331,359 domains from Day 2 of our Benign set – Accuracy: 99.40 % with 1,458 Unknown • Randomly Generated Domain Model – Trained: 90,884 domains from Conficker Botnet – Tested: 295,306 domains from Murofet Botnet – Accuracy: 99.34 % with 1,923 Unknown

String Statistics as Features

Feature Usefulness

Random Forests Algorithm FPO VIDEO TO COME

Random Forests • Pros: – Very high accuracy – Scalable across many nodes – Built-in protection from over fitting – Can handle very large data sets with many features – Robust with respect to goodness of features – Practical for real world use – Does not assume a distribution – Only two parameters to tune – Memory efficient • Cons: – Not the quickest classifier, but plenty fast in practice

Malicious Domain Classifier • Performance measured by 10 – fold Cross Validation • Training Set Cross Validation – 200k Benign 0.06 0.05 – 200k Malicious Out of Bag Error 0.04 0.03 0.02 0.01 0 10 20 30 40 50 60 70 80 90 100 Number of Trees K = 3 K = 5 K = 10

Results Bad Precision Good Precision 0.984 0.9795 0.979 0.983 0.9785 0.982 0.978 0.981 Precision Precision 0.9775 0.98 0.977 0.979 0.9765 0.978 0.976 0.977 0.9755 0.976 0.975 0.975 0.9745 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Number of Trees Number of Trees Bad Accuracy Good Accuracy 0.9805 0.9805 0.98 0.98 0.9795 0.9795 0.979 0.979 Accuracy Accuracy 0.9785 0.9785 0.978 0.978 0.9775 0.9775 0.977 0.977 0.9765 0.9765 0.976 0.976 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Number of Trees Number of Trees K = 3 K = 5 K = 10

Results Model Size 400 350 Model Size (MB) 300 250 K=3 200 150 K=5 100 K=10 50 0 10 20 30 40 50 60 70 80 90 100 Number of Trees Classification Throughput 60,000 50,000 Classifications/sec 40,000 K=3 K=5 30,000 K=10 20,000 10,000 0 10 20 30 40 50 60 70 80 90 100 Number of Trees

Results

Realtime Streaming Platform • Velocity is a platform for processing, analyzing, and visualizing large-scale event data in realtime • It was designed to be horizontally scalable and is built using Twitter’s Storm • It was built primarily for internal use with DNS events, IDS alerts, and netflow data, but it is in the process of being commercialized

Velocity Pipeline

Conclusion • Malicious domain classification • DGA domain identification using Markov Models • Summary Statistics based on domain string work well • Random Forests are very successful at classifying domains as Benign or Malicious • Real-time, distributed implementation

Future Work • Include more features: TTL, frequency seen, etc. • Correlation of bad domains based on ASN, Country, Organization, etc. • Identify subnets that are infected based on high traffic to bad domains • Identify Content Delivery Networks • Self Organizing Maps and other visualizations

Questions

Contact Information • John Munro • Email: jmunro@endgame.com • Jason Trost • Email: jtrost@endgame.com • Twitter: @jason_trost • Blog: www.covert.io

FlonCon 2013 | January 7 10 | Albuquerque, New Mexico Introductions - PowerPoint PPT Presentation

John Munro / jmunro@endgame.com Jason Trost / jtrost@endgame.com FlonCon 2013 | January 7 10 | Albuquerque, New Mexico Introductions John Munro (jmunro@endgame.com) Network Security Researcher and Data Scientist Jason Trost

Albuquerque 3/22/2014 It is time to come home to Albuquerque Jewel & Sid Cutter 3/22/2014

CAMPUS MARKET PLACE DEVELOPMENT UNIVERSITY OF NEW MEXICO ALBUQUERQUE, NEW MEXICO * TRADE

2005 CMP ALBUQUERQUE PUBLIC SCHOOLS Technology Department ALBUQUERQUE PUBLIC SCHOOLS Technology

Albuquerque, New Mexico Prepared By: Katharine Winograd, President , Central New Mexico Community

C R I M E in ALBUQUERQUE Presentation by Albuquerque Mayor Richard J. Berry to the Greater

Albuquerque Modern Streetcar Albuquerque Modern Streetcar Albuquerques Streetcar History

The New Mexico National Guard Serving the Nation and New Mexico October 28, 2015 New Mexico

Association of New Mexico Annual Meeting August 3, 2016 New Mexico BLM New Mexico August 3,

7/8/2013 1 7/8/2013 2 7/8/2013 3 7/8/2013 4 7/8/2013 5 7/8/2013 6 7/8/2013 7 7/8/2013

The New Mexico Jobs Council The New Mexico Jobs Council (NMJC) formed in 2013 when New Mexico

Air Quality Modeling of 2017 Ozone Episodes in the City of Albuquerque Kenneth Craig Sonoma

DFC Albuquerque 2016 Hotel Albuquerque 15 19 November Complimentary WiFi Internet

Lomas Corridor Albuquerque, NM 11/6/11 11/11/11 Lomas Corridor Albuquerque, NM

The City of Albuquerque The City of Albuquerque La Madrugada Early Head Start La Madrugada Early

New Mexico Primary Care Training Consortium Primary Care in the Land of Enchantment Albuquerque

New Mexico Evaluators Social Justice & Evaluation Conference September 12, 2017 Albuquerque,

Parent Survey Results FALL 2018 SCHOOL ENVIRONMENT SURVEY Board of Education Meeting - February

Evaluation: Hot tips from The Ian Potter Foundation Dr Squirrel Main, Research and Evaluation

Squirrel Hunting for Beginners by John Martsh R-3 Program Manager Why Squirrel Hunt?

PLANNING & COMMUNITY DEVELOPMENT Welcome 8/22/2016 Outline 2025 Comprehensive Plan -

http://ahmedelsabban.weebly.com/ -Website: WORK EXPERENCE 2006-2012 - I have been work at

Maintaining red squirrel range: Hair, blood and nuts Nick Mason & Scott Tullock 1945 2010

Application Layer Multicast Instructor: Hamid R. Rabiee Spring 2012 Outline Introduction

B Y 1 About At Ohh Deer we work with a vast catalogue of talented artists but we wanted the

Sambuz

Useful Links

Newsletter

Mail Us

FlonCon 2013 | January 7 10 | Albuquerque, New Mexico Introductions - PowerPoint PPT Presentation

John Munro / jmunro@endgame.com Jason Trost / jtrost@endgame.com FlonCon 2013 | January 7 10 | Albuquerque, New Mexico Introductions John Munro (jmunro@endgame.com) Network Security Researcher and Data Scientist Jason Trost

Albuquerque 3/22/2014 It is time to come home to Albuquerque Jewel &amp; Sid Cutter 3/22/2014

CAMPUS MARKET PLACE DEVELOPMENT UNIVERSITY OF NEW MEXICO ALBUQUERQUE, NEW MEXICO * TRADE

2005 CMP ALBUQUERQUE PUBLIC SCHOOLS Technology Department ALBUQUERQUE PUBLIC SCHOOLS Technology

Albuquerque, New Mexico Prepared By: Katharine Winograd, President , Central New Mexico Community

C R I M E in ALBUQUERQUE Presentation by Albuquerque Mayor Richard J. Berry to the Greater

Albuquerque Modern Streetcar Albuquerque Modern Streetcar Albuquerques Streetcar History

The New Mexico National Guard Serving the Nation and New Mexico October 28, 2015 New Mexico

Association of New Mexico Annual Meeting August 3, 2016 New Mexico BLM New Mexico August 3,

7/8/2013 1 7/8/2013 2 7/8/2013 3 7/8/2013 4 7/8/2013 5 7/8/2013 6 7/8/2013 7 7/8/2013

The New Mexico Jobs Council The New Mexico Jobs Council (NMJC) formed in 2013 when New Mexico

Air Quality Modeling of 2017 Ozone Episodes in the City of Albuquerque Kenneth Craig Sonoma

DFC Albuquerque 2016 Hotel Albuquerque 15 19 November Complimentary WiFi Internet

Lomas Corridor Albuquerque, NM 11/6/11 11/11/11 Lomas Corridor Albuquerque, NM

The City of Albuquerque The City of Albuquerque La Madrugada Early Head Start La Madrugada Early

New Mexico Primary Care Training Consortium Primary Care in the Land of Enchantment Albuquerque

New Mexico Evaluators Social Justice &amp; Evaluation Conference September 12, 2017 Albuquerque,

Parent Survey Results FALL 2018 SCHOOL ENVIRONMENT SURVEY Board of Education Meeting - February

Evaluation: Hot tips from The Ian Potter Foundation Dr Squirrel Main, Research and Evaluation

Squirrel Hunting for Beginners by John Martsh R-3 Program Manager Why Squirrel Hunt?

PLANNING &amp; COMMUNITY DEVELOPMENT Welcome 8/22/2016 Outline 2025 Comprehensive Plan -

http://ahmedelsabban.weebly.com/ -Website: WORK EXPERENCE 2006-2012 - I have been work at

Maintaining red squirrel range: Hair, blood and nuts Nick Mason &amp; Scott Tullock 1945 2010

Application Layer Multicast Instructor: Hamid R. Rabiee Spring 2012 Outline Introduction

B Y 1 About At Ohh Deer we work with a vast catalogue of talented artists but we wanted the

Sambuz

Useful Links

Newsletter

Mail Us

Albuquerque 3/22/2014 It is time to come home to Albuquerque Jewel & Sid Cutter 3/22/2014

New Mexico Evaluators Social Justice & Evaluation Conference September 12, 2017 Albuquerque,

PLANNING & COMMUNITY DEVELOPMENT Welcome 8/22/2016 Outline 2025 Comprehensive Plan -

Maintaining red squirrel range: Hair, blood and nuts Nick Mason & Scott Tullock 1945 2010