StarlingX Enhancements for Edge Networking Kailun Qin, Intel, kailun.qin@intel.com Dan Chen, China Unicom, chendan49@chinaunicom.cn A Fully Featured Cloud for the Distributed Edge
01 02 03 4 ’ 4 ’ 12 ’ EDGE NETWORKING WHAT IS STARLINGX? TECHNOLOGY DETAILS • What is Driving Edge Computing? • • What Problems is StarlingX Solving? Network Performance and Efficiency • Edge Computing Challenges • • Intent of the StarlingX Project Remote Management of Complex and • Edge Networking Requirements • StarlingX – Edge Virtualization Non-homogeneous Networks • Platform Reliability and Autonomous Site • Operations with Limited Connectivity StarlingX Scales Small or Large • Enhanced Network Security 04 05 06 10 ’ 4 ’ 3 ’ BUSINESS CASES STATUS FUTURE PLAN • Upstream Scope & Flow • Networking for Next-Gen Container • China Unicom ‘ s Full Stack Cloud • OpenStack Networking Upstream Architecture Network Architecture Status • StarlingX__Mapping to China Unicom ‘ s • Downstream Status Edge-Cloud Platform Requirement • Quote
What is Driving Edge Computing? A. Latency B. Bandwidth ~100ms C. Data Locality D. Scalability E. Connectivity F. Security “WHERE” Matters! ~10-40ms < 5ms < 1-2ms
Edge Computing Challenges AND REDUCE APPLICATION TO IMPROVE SERVICE COMPLY WITH DATA LOCALITY LATENCY CAPABILITIES Sources: https://virtualrealitypop.com/different-types-of-vr-ar-devices-making-sense-of-the-spatial-computing-landscape-605efe5b9f17; https://datafloq.com/read/how-edge-computing-will-give-new-life-health-care/3715; https://www.autotrader.ca/newsfeatures/20170109/continental-zf-debut-new-autonomous-driving-tech-at-ces-2017/
Edge Networking Requirements “Networking” Plays a Key Role at the Edge! 1 Network performance and efficiency Latency, Bandwidth 2 Remote management of complex and non-homogeneous networks Data Locality, Scalability 3 Reliability and autonomous site operations with limited connectivity Connectivity 4 Enhanced network security Security 5 Capex and Opex, Time To Market
01 02 03 4 ’ 4 ’ 12 ’ EDGE NETWORKING WHAT IS STARLINGX? TECHNOLOGY DETAILS • What is Driving Edge Computing? • • What Problems is StarlingX Solving? Network Performance and Efficiency • Edge Computing Challenges • • Intent of the StarlingX Project Remote Management of Complex and • Edge Networking Requirements • StarlingX – Edge Virtualization Non-homogeneous Networks • Platform Reliability and Autonomous Site • Operations with Limited Connectivity StarlingX Scales Small or Large • Enhanced Network Security 04 05 06 10 ’ 4 ’ 3 ’ BUSINESS CASES STATUS FUTURE PLAN • Upstream Scope & Flow • Networking for Next-Gen Container • China Unicom ‘ s Full Stack Cloud • OpenStack Networking Upstream Architecture Network Architecture Status • StarlingX__Mapping to China Unicom ‘ s • Downstream Status Edge-Cloud Platform Requirement • Quote
What Problems is StarlingX Solving? Data growth Network needs is massive to be smarter 1. Distributed infrastructure demands a different architecture 2. Managing a massively distributed compute environment is hard 3. The maturity and robustness of Cloud is required everywhere
Intent of the StarlingX Project Re-Configure Proven Cloud Technologies for Edge Compute • Orchestrate system-wide • Simplify deployment to geographically dispersed, remote Edge regions • Provide a deployment-ready, scalable, highly reliable Edge infrastructure software platform VIDEO HEALTHCARE MANUFACTURING DRONES ENERGY RETAIL TRANSPORTATION SMART CITIES PCs *Other names and brands may be claimed as the property of others
StarlingX – Edge Virtualization Platform • Network performance Upstream Projects and efficiency • Remote management of complex and non- * homogeneous networks Integration Project • Reliability and autonomous site operations with limited connectivity • Enhanced network security Upstream Projects * A Fully Featured Cloud for the Distributed Edge *Other names and brands may be claimed as the property of others
StarlingX Scales Small or Large • Single Server - Runs all functions • Dual Server - Redundant design • Multiple Server - Fully resilient and geographically distributable
01 02 03 4 ’ 4 ’ 12 ’ EDGE NETWORKING WHAT IS STARLINGX? TECHNOLOGY DETAILS • What is Driving Edge Computing? • • What Problems is StarlingX Solving? Network Performance and Efficiency • Edge Computing Challenges • • Intent of the StarlingX Project Remote Management of Complex and • Edge Networking Requirements • StarlingX – Edge Virtualization Non-homogeneous Networks • Platform Reliability and Autonomous Site • Operations with Limited Connectivity StarlingX Scales Small or Large • Enhanced Network Security 04 05 06 10 ’ 4 ’ 3 ’ BUSINESS CASES STATUS FUTURE PLAN • Upstream Scope & Flow • Networking for Next-Gen Container • China Unicom ‘ s Full Stack Cloud • OpenStack Networking Upstream Architecture Network Architecture Status • StarlingX__Mapping to China Unicom ‘ s • Downstream Status Edge-Cloud Platform Requirement • Quote
Network Performance and Efficiency
Mission-ready Network Performance • High performance Node-to-Node, VM-to-VM networking SR-IOV OVS-DPDK • Enabled: • OVS-DPDK PCI-passthrough SmartNIC/FPGA • SR-IOV Accelerated Data Plane • PCI-passthrough • WIP for OpenStack Upstream • SmartNIC/FPGA • Real-time and low latency enhancements to KVM • Reduced variability of interrupt latency Real-Time Low • KVM Reduced high resolution timer latency Extensions Latency • “Hardware Acceleration for Edge Networking” • Thu 15, 11:40am - 12:20pm, Level 1 - Hall A1
Configuration Management Acceleration technology support & Optimized configurations for Edge Cloud • Manage Installation and Configuration CLI Horizon Wizard Automation • Auto-discover new nodes in an edge site • Manage installation and configuration REST API parameters (e.g. Neutron config, agent parameters etc.) System Inventory System Inventory (Conductor) (Agents) • Nodal Configuration Puppet Puppet Hardware Puppet Hardware Resources • Network Interfaces (DPDK) Resources Resources Resources Resources SQL DB • Inventory Discovery SR-IOV • Physical NICs (# and bandwidth) SmartNIC • H/W acceleration devices for edge networking Manifests … (SR-IOV, SmartNIC etc.) Node System Configuration and Setup
Improved Network Efficiency • Based on OpenStack Neutron • L2/L3 scheduling/re-scheduling • Bulk operations; move away unnecessary operations L2/L3 schedu L2POP • L2/L3 agent SFC BGP- le eVPN • Event driven sync task • Stale RPC message handling Network Efficiency • Concurrency scenario enhancements VLAN QoS • L2POP transp L2/L3 Concur arent agent • Registration mechanism for extension of L2POP fdb information rency • VLAN transparent support • QoS, BGP-eVPN , SFC…
Remote Management of Complex and Non-homogeneous Networks
Host Management Improved low touch manageability & Reliability • Full life-cycle management of the host via Configuration Management REST API Service Infrastructure • Detect and automatically handles host Request Management Orchestration H/W Inventory failures and initiate recovery Host • Support automated and user level cluster Manage Monitor Management Manage Monitor VMs Processes connectivity tests provider-net-0 provider-net-1 Manage Monitor • Improve the way physical network topology Hosts is presented to the cloud/edge operator Host Host • Monitoring and alarms for: Host Host • Critical process failures (etc. L2/L3 agents) Host • Host Host Resource utilization thresholds, interface states Vendor Neutral Host Management
Network Segment Management Improved low touch manageability & Scalability • Based on OpenStack Neutron External Physical Network Infrastructure • Manage the underlying network segment biz- range-k biz- biz- ranges via REST API Scaling range-n range-0 • Full network orchestration biz- • range-p No direct interact with host config biz- range-0 • Control the segment ranges globally or on a per-tenant basis • Complex and non-homogeneous network infrastructure deployments at the Edge • Varied business requirements • Dynamic segment range scaling Tenant-0 Tenant-2 Tenant-1 Host config Admin Network Segment Range Management
Reliability and Autonomous Site Operations with Limited Connectivity
L2/L3 Rescheduling Enhanced high availability & Reliability Compute Node Compute Node Compute Node • Based on OpenStack Neutron dnsmasq DHCP DHCP DHCP dnsmasq dnsmasq Agent Agent Agent • Automatic rescheduling of DHCP servers and dnsmasq routers: dnsmasq dnsmasq • From offline L2/L3 agents to online L2/L3 agents empty overload unbalanced • When new agents become active • When agents become overloaded Threshold-based • Evaluation WIP: Compute Node Compute Node Compute Node • Manual rescheduling via: • Script DHCP DHCP DHCP dnsmasq dnsmasq dnsmasq Agent Agent Agent • API dnsmasq dnsmasq dnsmasq • Redistribution based on more sophisticated methodologies with additional info - CPU, memory, etc. balanced • Re-configure default settings (L3-HA) DHCP Server Rebalancing
Recommend
More recommend