Towards Converged SmartNIC Architecture for Bare Metal & Public Clouds Layong (Larry) Luo, Tencent TEG August 8, 2018
Agenda 1 SmartNIC in Bare Metal Cloud 2 SmartNIC in Public Cloud 3 Converged SmartNIC Architecture 4 Tencent SmartNIC Experience 5 Future Challenges
Introduction to Bare Metal Cloud • What is Bare Metal (BM) Cloud? – data centers in which dedicated physical machines (aka bare metal machines ) are provided to customers via cloud service model (VPC) VPC BM Machines • Why is BM Cloud? BM Cloud – Addressed two big obstacles for cloud adaption • Performance degradation: No virtualization overhead in CPU • Migration cost: Exactly the same stacks, tools and experience as on-premises
Introduction to Bare Metal Cloud • Who is using BM Cloud in Tencent: typical use cases Hybrid Cloud inside Tencent • Frontend services in Public Cloud (VMs) • Backend big data services in BM Cloud (PMs) • IO intensive, CPU intensive Custom Virtualization Stack • Custom Portal & OpenStack: smooth migration • Customer has strong technical teams • Consistent experience as on-premises Container Cloud • Container cloud for serverless computing • No virtualization overhead Learning more: https://cloud.tencent.com/product/cpm
Introduction to Bare Metal Cloud • How to implement BM Cloud: ToR based Virtualization – Requirements: any server from anywhere to any customer, BYO IP addresses VPC Blue VPC Blue VPC Yellow VPC Yellow Physical Physical Physical Physical Machine Machine Machine Machine 1 2 3 4 NIC NIC NIC NIC ToR 1 (VxLAN Overlay) ToR 2 (VxLAN Overlay) CLOS Network
Introduction to Bare Metal Cloud • How to implement BM Cloud: ToR based Virtualization – Requirements: any server from anywhere to any customer, BYO IP addresses VPC Blue VPC Blue VPC Yellow VPC Yellow Physical Physical Physical Physical Machine Machine Machine Machine 1 2 3 4 NIC NIC NIC NIC ToR 1 (VxLAN Overlay) ToR 2 (VxLAN Overlay) CLOS Network
Introduction to Bare Metal Cloud • How to implement BM Cloud: ToR based Virtualization – Requirements: any server from anywhere to any customer, BYO IP addresses VPC Blue VPC Blue VPC Yellow VPC Yellow Physical Physical Physical Physical Machine Machine Machine Machine 1 2 3 4 NIC NIC NIC NIC VLAN IP3 IP1 Blue ToR 1 (VxLAN Overlay) ToR 2 (VxLAN Overlay) CLOS Network
Introduction to Bare Metal Cloud • How to implement BM Cloud: ToR based Virtualization – Requirements: any server from anywhere to any customer, BYO IP addresses VPC Blue VPC Blue VPC Yellow VPC Yellow Physical Physical Physical Physical Machine Machine Machine Machine 1 2 3 4 NIC NIC NIC NIC VLAN IP3 IP1 Blue ToR 1 (VxLAN Overlay) ToR 2 (VxLAN Overlay) VxLAN ToR2 ToR1 IP3 IP1 Blue CLOS Network
Introduction to Bare Metal Cloud • How to implement BM Cloud: ToR based Virtualization – Requirements: any server from anywhere to any customer, BYO IP addresses VPC Blue VPC Blue VPC Yellow VPC Yellow Physical Physical Physical Physical Machine Machine Machine Machine 1 2 3 4 NIC NIC NIC NIC VLAN IP3 IP1 Blue ToR 1 (VxLAN Overlay) ToR 2 (VxLAN Overlay) VxLAN VxLAN ToR2 ToR1 IP3 IP1 ToR2 ToR1 IP3 IP1 Blue Blue CLOS Network
Introduction to Bare Metal Cloud • How to implement BM Cloud: ToR based Virtualization – Requirements: any server from anywhere to any customer, BYO IP addresses VPC Blue VPC Blue VPC Yellow VPC Yellow Physical Physical Physical Physical Machine Machine Machine Machine 1 2 3 4 NIC NIC NIC NIC VLAN VLAN IP3 IP1 IP3 IP1 Blue Blue ToR 1 (VxLAN Overlay) ToR 2 (VxLAN Overlay) VxLAN VxLAN ToR2 ToR1 IP3 IP1 ToR2 ToR1 IP3 IP1 Blue Blue CLOS Network
Challenges in Bare Metal Cloud • Scalability – ToR switch table size is limited Physical • 32-bit host routing table, VxLAN tunnel table Physical Machine Machine – VPC network size is limited 1 2 NIC • Flexibility NIC – ToR switch limited programmability – Unable to support security group and more ToR (VxLAN Overlay)
SmartNIC in Bare Metal Cloud Physical Physical Physical Physical Machine machine machine Machine 1 2 2 1 NIC NIC VxLAN VxLAN Security More Security More NIC NIC SmartNIC SmartNIC ToR ToR VxLAN Overlay 1. ToR based Virtualization 2. SmartNIC based Virtualization Solutions: • Challenges: Scalability: ToR (centralized) -> multiple SmartNICs (distributed) • Scalability : limited switch table size • Flexibility: Programmable chips (ARM & FPGA) to support • Flexibility : unable to support security group advanced features (security group, network ACL, QoS…)
Agenda 1 SmartNIC in Bare Metal Cloud 2 SmartNIC in Public Cloud 3 Converged SmartNIC Architecture 4 Tencent SmartNIC Experience 5 Future Challenges
Why SmartNIC in Public Cloud? • Performance Perspective – Slow increase of CPU performance: double every 2 years, but not much longer – Fast increase of network speed (1G -> 50G) & host SDN policies Source: https://bertrandmeyer.com/2011/06/20/concurrent-programming-is-easy/intel/ • Specialization (HW acceleration) for efficiency (perf per watt)
Why SmartNIC in Public Cloud? • Revenue Perspective • SmartNIC increases the NIC cost a bit • But the CPU savings/revenue increase could be very significant Azure SmartNIC, NSDI 2018 • Maximize CPU savings by offloading infra workloads to SmartNIC
SmartNIC Evolution in Public Cloud VM2 VM2 VM3 VM2 VM1 VM1 VM1 Hypervisor Hypervisor Light Hypervisor Virtual Switch Virtual Switch (Slow Path) (SDN policies:GRE, Security) CPU and Memory Virtualization Only 1st packet SmartNIC SmartNIC All packets Commodity NIC All packets (Fast Path) The new “Hypervisor” 2nd+ packets 1. Software Hypervisor 2. Network Acceleration 3. Hypervisor Offload Performance Boost CPU Savings/Revenue Increase Push Performance Boost and CPU Savings to the limit!
Agenda 1 SmartNIC in Bare Metal Cloud 2 SmartNIC in Public Cloud 3 Converged SmartNIC Architecture 4 Tencent SmartNIC Experience 5 Future Challenges
Converged SmartNIC for Bare Metal and Public Cloud Converged “Hypervisor” in SmartNIC SmartNIC Evolution in BareMetal Cloud Convergence Converged SmartNIC Platform SmartNIC Evolution in Public Cloud
Agenda 1 SmartNIC in Bare Metal Cloud 2 SmartNIC in Public Cloud 3 Converged SmartNIC Architecture 4 Tencent SmartNIC Experience 5 Future Challenges
Tencent SmartNIC Experience • Hardware Selection: SoC vs. discreate chips, FPGA vs. ASIC/NP/ARM – No simple right answer – Requirements and constraints vary in different companies at different time: time to market, feature set, requirement stability, chip availability, cost, power … • Agility: Tencent Speed – Build a SmartNIC team (~10) in less than a year – Finish FPGA pipeline in 3 months (FPGA hard to program? Yes and No) – Build a SmartNIC board in 4 months, in just one iteration – Ship a SW-HW co-design project (from planning to deployment) in about 1 year
Agenda 1 SmartNIC in Bare Metal Cloud 2 SmartNIC in Public Cloud 3 Converged SmartNIC Architecture 4 Tencent SmartNIC Experience 5 Future Challenges
Future Challenges on Hardware HW Accel Basic NIC (fast path) CPU (slow path)
Future Challenges on Hardware HW Accel Basic NIC (fast path) CPU (slow path) Power, area and cost challenges
Future Challenges on Hardware SoC HW Accel Basic NIC (fast path) (all in one) CPU (slow path) Power, area and cost challenges
Future Challenges on Hardware HW Accel: ASIC HW Accel: FPGA (Programmability?) ARM CPU ARM CPU Basic NIC Basic NIC (RoCEv2?) (RoCEv2?) Partner 1 Partner 2 SoC HW Accel Basic NIC (fast path) (all in one) HW Accel: ASIC HW Accel: FPGA (Programmability?) CPU ARM CPU ARM CPU (slow path) Basic NIC (?) Basic NIC Partner 3 Partner 4 Power, area and cost challenges Ready Partial Ready Not Ready
Future Challenges on Hardware HW Accel: ASIC HW Accel: FPGA (Programmability?) ARM CPU ARM CPU Basic NIC Basic NIC (RoCEv2?) (RoCEv2?) Partner 1 Partner 2 SoC HW Accel Basic Redefine SmartNIC SoC by Cloud Providers! NIC (fast path) (all in one) HW Accel: ASIC HW Accel: FPGA (Programmability?) CPU ARM CPU ARM CPU (slow path) Basic NIC (?) Basic NIC Partner 3 Partner 4 Power, area and cost challenges Ready Partial Ready Not Ready
Future Challenges on Architecture • Task partition on heterogenous platform – Architectural boundaries between x86, FPGA and ARM for different workloads: host SDN, storage and NFV (IPSec VPN, LB, etc.) • Hitless upgrade and reboot – Collaborative process between x86, FPGA and ARM • Live migration with hypervisor offload – How to log dirty page if hypervisor is totally bypassed?
Thanks!
Recommend
More recommend