io virtualization with infiniband infiniband as a
play

IO Virtualization with InfiniBand [InfiniBand as a Hypervisor - PowerPoint PPT Presentation

IO Virtualization with InfiniBand [InfiniBand as a Hypervisor Accelerator] Michael Kagan Vice President, Architecture Mellanox Technologies michael@mellanox.co.il Key messages InfiniBand enables efficient servers virtualization


  1. IO Virtualization with InfiniBand [InfiniBand as a Hypervisor Accelerator] Michael Kagan Vice President, Architecture Mellanox Technologies michael@mellanox.co.il

  2. Key messages • InfiniBand enables efficient server’s virtualization – Cross-domain isolation – Efficient IO sharing – Protection enforcement • Existing HW fully supports virtualization – The most cost-effective path for single-node virtual servers – SW-transparent scale-out • VMM support in OpenIB SW stack by fall ’05 – Alpha version of FW and driver in June Leadership in InfiniBand silicon April 05 2

  3. InfiniBand scope in server virtualization • CPU virtualization • NO – Compute power • Memory virtualization • Partial – Memory allocation – No – Address translation – Yes – for IO accesses – Protection – Yes – for IO accesses • IO virtualization • YES Domain0 DomainX DomainY Bridge Kernel Kernel IO drv IO drv IO drv IO drv Virtual switch(es) Hypervisor IO CPU memory Virtualized server Leadership in InfiniBand silicon April 05 3

  4. Link data rate: Today: 2.5,10,20,30,60Gb/s InfiniBand – Overview Spec: up to 120Gb/sec Cu & Optical • Performance – Bandwidth – up to 120Gbit/sec per link – Latency – under 3uSec (today) • Kernel bypass for IO access – Cross-process protection and isolation End End Node Node • Quality Of Service – End node End Node End – Fabric Node Switch • Scalability/flexibility – Up to 48K local nodes, up to 2 128 total – Multiple link width/trace (Cu, Fiber) Switch Switch End • Multiple transport services in HW Node – Reliable and unreliable End – Connected and datagram End Switch Node Node – Automatic path migration in HW • Memory exposure to remote node End Node End – RDMA-read and RDMA-write End Node Node • Multiple networks on a single wire – Network partitioning in HW (“VLAN”) – Multiple independent virtual networks on a wire Leadership in InfiniBand silicon April 05 4

  5. InfiniBand communication Consumer channel interface Network (fabric) interface Leadership in InfiniBand silicon April 05 5

  6. InfiniBand Channel Interface Host Channel Adapter (HCA) Consumer Queue Model • Consumers connected via queues • Asynchronous execution – Local or remote node • In-order execution on queue • 16M independent queues • Flexible completion report – 16M IO channel – 16M QoS levels • transport, priority • Memory access through virtual address – Remote and local – 2G address spaces, 64-bit each – Access rights and isolation enforced by HW PCI-Express HCA InfiniBand ports Leadership in InfiniBand silicon April 05 6

  7. InfiniBand Host Channel Adapter • HCA configuration via Command Queue – Initialization – Run-time resource assignment and setup • HCA resources (queues) allocated for applications – Resource protection through User Access Region • IO access through HCA QPs (“IO channels”) – QPs properties match IO requirements – Cross-QP resource isolation • Memory protection – via Protection Domains – Many-to one association • Address space to Protection Domain • QP to Protection Domain – Memory access using Key and virtual address • Boundary and access right validation • Protection Domain validation • Virtual to physical (HW) address translation • Interrupts delivery – Event Queues Userland App App App App Kernel App App Driver CCQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ Up to 16M work queues HCA Leadership in InfiniBand silicon April 05 7

  8. InfiniBand Host Channel Adapter • HCA configuration via Command Queue • HCA initialization by VMM – Initialization – Assign command queue per guest domain – Run-time resource assignment and setup – HCA resources partitioned and exported to guest OSes • HCA resources (queues) allocated for applications • HCA resources allocated to guests/their apps – Resource protection through User Access Region – Resource protection through UAR • IO access through HCA QPs (“IO channels”) • Each VM has direct IO access – QPs properties match IO requirements – “Hypervisor offload” – Cross-QP resource isolation • Memory protection – via Protection Domains • Memory protection – via Protection Domains • Address translation step generates HW address – Many-to one association – Guest Physical Address to HW address translation • Address space to Protection Domain – Validate access rights • QP to Protection Domain – Memory access using Key and virtual address • Boundary and access right validation • Protection Domain validation • Virtual to physical (HW) address translation • Interrupts delivery – Event Queues Domain0 DomainX DomainY DomainZ Userland App App App App App App App App App App Kernel Kernel Kernel Kernel App App App App App App App App Driver Kernel App App Driver CCQ CCQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ Up to 16M work queues Up to 16M work queues HCA HCA Leadership in InfiniBand silicon April 05 8

  9. InfiniBand Host Channel Adapter • HCA configuration via Command Queue • HCA initialization by VMM – Initialization – Assign command queue per guest domain – Run-time resource assignment and setup – HCA resources partitioned and exported to guest OSes • HCA resources (queues) allocated for applications • HCA resources allocated to guests/their apps – Resource protection through User Access Region – Resource protection through UAR • IO access through HCA QPs (“IO channels”) • Each VM has direct IO access – QPs properties match IO requirements – “Hypervisor offload” – Cross-QP resource isolation • Memory protection – via Protection Domains • Memory protection – via Protection Domains • Address translation step generates HW address – Many-to one association – Guest Physical Address to HW address translation • Address space to Protection Domain – Validate access rights • QP to Protection Domain • Guest driver manages HCA resources at run-time – Memory access using Key and virtual address – Each OS sees “its own HCA” • Boundary and access right validation – HCA HW keeps guest OS honest • Protection Domain validation • Virtual to physical (HW) address translation – Connection manager – see later • Interrupts delivery – Event Queues Domain0 DomainX DomainY DomainZ Userland App App App App App App App App App App Kernel Kernel Kernel Kernel App App App App App App App App Driver Driver Driver Driver Kernel App App Driver CCQ CCQ CCQ CCQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ CCQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ WQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ CQ Up to 16M work queues Up to 16M work queues Up to 128 CCQ HCA HCA Leadership in InfiniBand silicon April 05 9

  10. Address translation and protection Non-virtual server Virtual server • HCA TPT set by driver • VMM sets guest HW address tables – Boundaries, access rights – Address space per guest domain – vir2phys table – Managed and updated by VMM • Run-time address translation • Guest driver sets HCA TPT – Access right validation – Guest PA in vir2phys table – Translation tables’ walk • Run-time address translation 1. Virtual to Guest Phys. Addr 2. Guest Physical to HW address MKey Virtual address HW physical address MKey Virtual address HW physical address 1 2 1 Application 1 Application MKey entry MKey entry 2 2 VM GPA 2 MKey entry MKey table Translation tables MKey table Translation tables Leadership in InfiniBand silicon April 05 10

  11. IO virtualization with InfiniBand single node, local IO Domain0 DomainX DomainY Domain0 DomainX DomainY Bridge Bridge Kernel Kernel Kernel Kernel IO Hypervisor IO drv IO drv IO drv IO drv IO drv IO drv Off-load IO drv IO drv Virtualized server Virtual switch(es) Hypervisor Virtualized server HW IO IO switch(es) HCA • Full offload for local cross-domain access – Eliminate Hypervisor kernel transition on data path • Reduce cross-domain access latency • Reduce CPU utilization • Kernel bypass on IO access to guest application • Shared [local] IO – Shared by guest domains Leadership in InfiniBand silicon April 05 11

Recommend


More recommend