Network traffic classification: From theory to practice Pere - PowerPoint PPT Presentation

Network traffic classification: From theory to practice Pere Barlet-Ros Associate Professor at UPC BarcelonaTech Co-founder and Chairman at Polygraph.io Joint work with: Valentín Carela-Español, Tomasz Bujlow and Josep Solé-Pareta

Background • What do we refer to as traffic classification ? – Identifying the application that generated each flow • What is traffic classification used for? – Network planning and dimensioning – Per-application performance evaluation – Traffic steering / QoS / SLA validation – Charging and billing

State of the Art: Ports • Port-based – Computationally lightweight – Payloads not needed – Easy to understand and program – Low accuracy and completeness

State of the Art: DPI • Deep packet inspection (DPI) – High accuracy and completeness – Computationally expensive – Needs payload access – Privacy concerns – Cannot work with encrypted traffic

State of the Art: ML • Machine Learning – High accuracy and completeness – Computationally viable – Payloads not needed – Can work with encrypted traffic – Needs retraining

Main limitations of ML-TC • Introduction in real products and operational environments is limited and slow – Current proposals suffer from practical problems – Actual products rely on simpler methods or DPI • We identified 3 main real-world problems 1) The deployment problem 2) The maintenance problem 3) The validation problem

1) Deployment problem • Current solutions are difficult to deploy – Need dedicated hardware appliances / probes – Need packet- level access (e.g. compute features, …) • How to address this problem? – Work with flow level data (e.g. Netflow) – Support packet sampling (e.g. Sampled Netflow)

NetFlow w/o sampling • Challenge: NetFlow v5 features are very limited – IPs, ports, protocol, TCP flags, duration, #pkts , … • State-of-the-art ML technique: C4.5 decision tree

Results (NetFlow w/o sampling) • UPC dataset (publicly available) – 7 x 15 min traces from UPC access link – Collected at different days and hours – Labelled with L7-filter (strict version with less FPR)

Results (Sampled NetFlow) • Impact of packet sampling

Sources of inaccuracy 1) Error in the estimation of the traffic features 2) Changes in flow size distribution 3) Changes in flow splitting probability

Solution (Sampled NetFlow)

Deployment problem: Summary • Current proposals are difficult to deploy • Proposed a simple but effective technique – Supports standard NetFlow data – Supports packet sampling • Main limitation: Needs to be frequently retrained V. Carela-Español, P . Barlet-Ros, A. Cabellos-Aparicio, J. Solé-Pareta. Analysis of the impact of sampling on NetFlow traffic classification . Computer Networks , 55(5), 2011.

2) Maintenance problem • Difficult to keep classification model updated – Traffic changes, application updates, new applications – Involve significant human intervention – ML models need to be frequently retrained • Possible solution to the problem – Make retraining automatic – Computationally viable – Without human intervention

Autonomic Traffic Classification • Lightweight DPI for retraining – Small traffic sample (e.g. 1/10000 flow sampling)

Evaluation • 14-days trace collected at CESCA

Temporal/Spatial obsolescence • Comparison without autonomic retraining

Maintenance problem: Summary • Exiting classifiers need periodic retrainings – Temporal obsolescence: Changes in application traffic – Spatial obsolescence: Different networks • Autonomic traffic classification system – Easy to deploy: Works with Sampled NetFlow – Easy to maintain: Lightweight DPI for self-training V. Carela-Español, P . Barlet-Ros, O. Mula-Valls, J. Solé-Pareta. An autonomic traffic classification system for network operation and management . Journal of Network and Systems Management , 23(3):401-419, 2015.

3) Validation problem • Current proposals are difficult to validate , compare and reproduce – Private datasets – Different ground-truth generators • Our contribution – Publication of labeled datasets (with payloads) – Common benchmark to validate/compare/reproduce – Validation of common ground-truth generators

Proposal • Reliable labeled dataset with full payloads – Accurate: VBS (label from the application socket) – Avoid privacy issues: Realistic artificial traffic

Methodology • Manually generate representative traffic – Create fake accounts (e.g. Gmail, Facebook, Twitter) – Interact with the service simulating human behavior (e.g. posting, chatting, gaming, watching videos, …)

Dataset • > 750K flows, ~55 GB of data

DPI tools compared

Application protocols

Applications

Web services (summary) • PACE: 16/34 (6 over 80%) • nDPI: 10/34 (6 over 80%) • OpenDPI: 2/34 • Libprotoident: 0/34 • L7-filter: 0/44 (high FPR) • NBAR: 0/34

Validation problem: Summary • Comparison of most popular ground-truth generators – PACE: Best results at all classification levels – Libprotoident: Very good results at application/protocol – nDPI: Good results, web services level, open source – NBAR and L7-filter: Very poor results • Dataset including payloads is publicly available – http://www.cba.upc.edu/monitoring/traffic-classification (Including also all other datasets presented in these slides) – Common benchmark to validate, compare and reproduce T. Bujlow, V. Carela-Español, P. Barlet-Ros. Independent comparison of popular DPI tools for traffic classification . Computer Networks , 76:75-89, 2015. V. Carela-Español, T. Bujlow, P. Barlet-Ros. Is our ground-truth for traffic classification reliable? In Proc. of Passive and Active Measurement Conf. (PAM), 2014.

Network Polygraph • Addressed 3 practical problems – The deployment problem (Sampled Netflow) – The maintenance problem (Autonomic retraining) – The validation problem (Labeled payload traces) • We identified interest in the market – We created a UPC spin-off: https://polygraph.io – Several customers world-wide P . Barlet-Ros, J. Sanjuàs, V. Carela-Español. Network Polygraph: A cloud-based network visibility service. In ACM SIGCOMM Conf. , Industrial Demo, 2015.

Why Network Polygraph? • Other products are expensive and difficult to deploy – Can only be afforded by large operators, ISPs, … – Large portion of the market are SMEs (>90% in EU) • Our technology based on Sampled NetFlow only needs a small volume of traffic data – <0.5% of extra bandwidth usage – Can be provided as a service from the cloud (SaaS)

Visibility-to-cost ratio visibility cost

Website + On-Line Demo https://polygraph.io

traffic volume, breakdown by application

HTTP services

top talkers (addresses, ports, autonomous systems)

subnetwork-level bandwidth hogs

traffic geolocation (origins & destinations)

anomaly and attack detection with automatic baselining

indexed traffic database for forensic analysis

Network Polygraph Talaia Networks, S.L. K2M – Parc UPC Campus Nord Jordi Girona, 1-3 Barcelona (08034) Spain Telephone: +34 93 405 45 87 contact@polygraph.io https://polygraph.io

Network traffic classification: From theory to practice Pere - PowerPoint PPT Presentation

Network traffic classification: From theory to practice Pere Barlet-Ros Associate Professor at UPC BarcelonaTech Co-founder and Chairman at Polygraph.io Joint work with: Valentn Carela-Espaol, Tomasz Bujlow and Josep Sol-Pareta

Traffic Classification in the Fog Scott E. Coull February 23, 2006 Overview What is traffic

Need for Classification Classification required To isolate traffic of interest

Traffic Shaping, Traffic Policing Peter Puschner, Institut fr Technische Informatik Traffic

Traffic signal optimization and traffic assignment Traffic signals Traffic signal optimization

The Traffic Conflicts Methodology revisited Richard van der Horst Traffic Safety Assessment

Traffic Engineering with Traffic Engineering with Estimated Traffic Matrices Estimated Traffic

Rendezvous-based Traffic Rendezvous-based Traffic Classification, Measurement, Classification,

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Theory or Practice? Theory : Without theory, practice is but routine born out of habit.

Network Traffic Classification: From Theory To Practice Valentn Carela-Espaol Advisor: Pere

Combining Machine and Automata Learning for Network Traffic Classification Zeynab Sabahi, Fatemeh

VoIP/SMPP traffic sniffer Break through your data Traffic sniffer modules VoIP traffic sniffer

Pinson and Arkansas Blvd. Traffic Count Legend Traffic Count Map Pinson and Ark Blvd Ordinance:

Broward County Traffic Engineering Programs Broward County Traffic Engineering Programs

using Traffic Analysis Attacks Salini S K What is Traffic Analysis What is Traffic Analysis

Lessons from Framing the Analysis: Report from ROI on UDI Work Group Kade Etter Johnson and

Comparing results - Discussion Fernando Galindo-Rueda OECD - DSTI INNODRIVE Conference

Cant Wait for Perfect Implementing Good Enough" Digital Preservation @shirapeltzman

Challenges of VR Application Distribution David J. Zielinski Smith Media Labs Technology

CEBAF Performance Plan Assets and Maintenance Management Workshop 2018 ALBA Synchrotron

Putting people at the centre of digital preservation Sophie Shilling Digital Archivist, Royal

AC ACC-DTA S Stryker er/LAV V Con ontrac acting D Division Stryker Wholesale Supply

V-MDEX2020 YOUR MISSION. OUR HONOR. Clint Herrick Senior Director Product Support Engineering

Network traffic classification: From theory to practice Pere - PowerPoint PPT Presentation

Network traffic classification: From theory to practice Pere Barlet-Ros Associate Professor at UPC BarcelonaTech Co-founder and Chairman at Polygraph.io Joint work with: Valentn Carela-Espaol, Tomasz Bujlow and Josep Sol-Pareta

Traffic Classification in the Fog Scott E. Coull February 23, 2006 Overview What is traffic

Need for Classification Classification required To isolate traffic of interest

Traffic Shaping, Traffic Policing Peter Puschner, Institut fr Technische Informatik Traffic

Traffic signal optimization and traffic assignment Traffic signals Traffic signal optimization

The Traffic Conflicts Methodology revisited Richard van der Horst Traffic Safety Assessment

Traffic Engineering with Traffic Engineering with Estimated Traffic Matrices Estimated Traffic

Rendezvous-based Traffic Rendezvous-based Traffic Classification, Measurement, Classification,

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Theory or Practice? Theory : Without theory, practice is but routine born out of habit.

Network Traffic Classification: From Theory To Practice Valentn Carela-Espaol Advisor: Pere

Combining Machine and Automata Learning for Network Traffic Classification Zeynab Sabahi, Fatemeh

VoIP/SMPP traffic sniffer Break through your data Traffic sniffer modules VoIP traffic sniffer

Pinson and Arkansas Blvd. Traffic Count Legend Traffic Count Map Pinson and Ark Blvd Ordinance:

Broward County Traffic Engineering Programs Broward County Traffic Engineering Programs

using Traffic Analysis Attacks Salini S K What is Traffic Analysis What is Traffic Analysis

Lessons from Framing the Analysis: Report from ROI on UDI Work Group Kade Etter Johnson and

Comparing results - Discussion Fernando Galindo-Rueda OECD - DSTI INNODRIVE Conference

Cant Wait for Perfect Implementing Good Enough&quot; Digital Preservation @shirapeltzman

Challenges of VR Application Distribution David J. Zielinski Smith Media Labs Technology

CEBAF Performance Plan Assets and Maintenance Management Workshop 2018 ALBA Synchrotron

Putting people at the centre of digital preservation Sophie Shilling Digital Archivist, Royal

AC ACC-DTA S Stryker er/LAV V Con ontrac acting D Division Stryker Wholesale Supply

V-MDEX2020 YOUR MISSION. OUR HONOR. Clint Herrick Senior Director Product Support Engineering

Cant Wait for Perfect Implementing Good Enough" Digital Preservation @shirapeltzman