The Need for Complex Analytics from Forwarding Pipelines Tom Tofigh, AT&T Nic Viljoen, Netronome Bryan Sullivan, AT&T
Agenda • Problem Statement Proposed SDN Based Observability • Gaps in Real Time Observability • • Importance of Real-time Programmable Analytics • Data Plane Programmability for Complex Analytics • Programmable NIC Cards • Summary 2
Problem Statement • Require real time observability at data plane and control plane level • Require programmable granular systems without the unscalable approach of metering all the data all the time Looking for the Call Drop Reason!
Gaps: Dynamic & Real-Time Programmable Analytics • Achieve autonomous control through programmable data plane analytics • Real time dynamic instrumentation- virtual probes that gather trend data • Targets specific flows, SOC/SmartNICs, VMs or containers for observation • Enables instant root cause analysis • Provide scalable solutions for fine grained observation 4
Autonomous Control System Concept Measure Analyze
Proposed Evolution for Dynamic Probing
Dynamic Probe & Measurement Examples Complex analytics Flow jitter, latency measurement • QoE Packet drop rate • Application analysis • dynamic P4 query Models present compile • DDoS detection Security • Deep packet inspection • Stateful flow monitor analyze disseminate collect configure Custom statistics • Customer Flow tracing • Care Root cause analysis • Load estimation • Optimization Traffic matrix calculation • Elephant flow identification •
ACORD Observability @ L0 – L7 Customer Apps Apps Apps Security Diagnosis Care Analytics Platform ONOS + XOS (XOS + Services) Measurement Abstraction Interface Spine Spine Spine Spine Routers Routers Routers Routers Leaf-Spine Fabric Leaf Leaf Leaf Leaf Leaf Leaf 2.8Tbps Routers Routers Routers Routers routers Routers OVS VM VM VM ROADM (Access) (Core) GPON PON VM VM VM VM SmartNIC OLT MACs
The SmartNIC Nic Viljoen, Netronome Systems
The Programmable SmartNIC Programmable NIC Architecture Challenges with Fixed-Function NICs • Networking applications have diverse requirements • Fixed-function ASICs have “baked-in” functionality and lack flexibility Programmable NIC Advantages • Develop custom networking applications • High performance at network • Preserve CPU cycles “Sea of Workers” for customized • CPU OVS @40Gbps-12 cores networking workloads • Offload OVS @40Gbps-1 core Support for P4 and Match/Action • Dynamic analytics structures • High-level languages- P4/C Optimized memory architecture • Examples of SmartNICs: Netronome’s Agilio, Cavium LiquidIO
Augmenting Netronome’s Agilio OVS Software for Virtual Probing Controller Action Arguments Flow Stats and Features vProbe vProbe Application Application Compute Node • Interpret flow stats and features • Aggregate info to controllers-More OVS Userspace Processes vProbe on next slide VM VM (ovs-dbserver, ovs-vswitchd) Application Exact Match Flow Flow Cache Cache Linux Kernel • Keep state for >million flows OVS Datapath • Flow Stats Packet Programmable state based on and Features Rx/Tx vProbe application requirements Kernel Flow Table, Actions Fallback Path • 25G/40G line rate • Programmable payload size/ Offload Agilio-CX number of flows tradeoff OVS Datapath Adapter Deliver to Host • Self-learning Exact Match Actions Tunnels Match Flow Update Statistics Tables w o F l Cache f o t e k c a P t s r i F w o l F f o s t e k c a P g n i n i a m e R
vProbe Application Classify OVS • Flow-based data and stat aggregation using techniques such as machine learning 1 • Enables powerful use-cases through use of flow analytics: React Cycle Aggregate 2 • Dynamic configuration for DDoS at VM level using 4 Required and high speed clustering/classification algorithms (next in < 12s vProbe slide) Configure OVS • Network shaping based on predictive flow 3 characteristics-Work with University of Arizona has shown 50% improvement in offload utilisation • Elastic VM resource provisioning Analyze • Filtering and grouping for analysis at various levels of vProbe visibility • Rack, Data Center, Metro, Regional, National
East/West DDOS Use Case Per VM egress clustering Drop traffic (targeted/all), Reduce VM resources, • E/W DDoS attacks are prevalent Shut down VM 3 • Use vProbe to quickly identify infected VMs and react by modifying flow rules or VMs 4 • Policy dictated by higher-level orchestrator 2 • Aggregated data can be disseminated to 1) Classify multiple orchestration levels 2) Aggregate • Enables distributed response at server/ 3) Analyze rack/DC/regional levels 4) Configure 1
Observability-Intelligence at the Edge • Intelligent network would benefit from programmable switches, NICs and CPU • NIC based offload is essential as CPU power is not scaling at the rate of Network traffic increase • AT&T’s John Donovan estimated our traffic has increased by 150,000% since 2007 • This means offload is essential to negate cost and maintain performance • Flexible offload opens up potential analytics use cases that have previously not been tenable
Overview-What do you need to find a needle OBSERVABILITY FLEXIBILITY COMPUTABILITY the ability to the ability to create a the ability to monitor statefully observe real time feedback and aggregate connections loop using dynamic complex data in real data plane and control time functions
With Dynamic Programmable vProbe
Call to Action-We Need Your Use Cases! • We are looking to gather a list of use cases for a dynamic analytics platform currently being developed • Email: Tom Tofigh ( Tofigh@att.com ) or Nic Viljoen ( nick.viljoen@netronome.com )-email address with an k! • Join us for the next series of POCs Thank You!
Recommend
More recommend