a light weight virtual machine monitor for blue gene p
play

A Light-Weight Virtual Machine Monitor for Blue Gene/P KIT - PowerPoint PPT Presentation

A Light-Weight Virtual Machine Monitor for Blue Gene/P KIT University of the State of Baden-Wuerttemberg and www.kit.edu National Research Center of the Helmholtz Association BG/P Programming Model Traditional BG/P supercomputer CIOD MPI


  1. A Light-Weight Virtual Machine Monitor for Blue Gene/P KIT – University of the State of Baden-Wuerttemberg and www.kit.edu National Research Center of the Helmholtz Association

  2. BG/P Programming Model Traditional BG/P supercomputer CIOD MPI MPI programming model: Linux CNK CNK Parallel programming run-time (MPI) … Compute-Node Kernel IO compute compute … node node 0 node 63 CNK: OS for massive parallel applications Light-weight kernel, “POSIX-ish” Function-shipping to IO-nodes Perfect choice for current HPC apps MPI programming model Low OS noise Performance, scalability, customizability 2 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  3. Application Scale-Out Standard Server / Commercial workloads are scaling out Big data (hadoop, stream processing, caching) Clouds (ec2, vcloud) Commodity OSes, runtimes, (HW) Linux OS, Java, Ethernet CIOD ? MPI MPI CNK not truly compatible: No full Linux/POSIX compatibility Linux CNK CNK … No compatibility to standardized networking protocols IO compute compute … node node 0 node 63 3 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  4. HPC readiness vs. Compatibility Commodity OSes not designed for CIOD MPI MPI … Supercomputers Linux CNK CNK … OS footprint and complexity? IO compute compute … Network protocol overhead node node 0 node 63 Problem for standard scale-out ? software as well BG could run such workloads – in principle “cores, memory, interconnect” Reference HW for future data centers Can we have … the HPC strength of CNK and … the compatibility of commodity OS / NW? 4 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  5. A Light-Weight VMM for Supercomputers Idea: use a light-weight kernel and a VMM VMM gives HW-compatibility Guest Can run Linux in a VM VM MPI Can run Linux applications MPI MPI Can communicate via (virtualized) Ethernet VMM Light-weight kernel preserves LWK short path to HW Run HPC apps “natively” Direct access to HPC interconnects Kernel small and customizable Low kernel footprint Development path for converging platforms and workloads 5 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  6. Prototype L4 based prototype Small, privileged micro-kernel Linux User-level VMM component 2.6 Current focus: VMM layer (this talk) L4 APP VMM Virtual BG cores, memory, interconnects L4 Support for Standard OSes Future work: Native HPC app support L4 has native API Leverage ex. research on L4 OS frameworks Native HPC app layers [kitten/palacios] 6 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  7. BG Overview and VMM agenda Compute Nodes PPC Guest VM 4 PowerPC 450 cores rwx 450 TLB A vTORUS A 2 GB physical memory vPPC vTLB vBIC vCollective MMU/TLB VMM BIC Interrupt controller collective 7 L4 Torus and Collective IO … compute compute other HW torus node node 0 node 63 (mailboxes, JTAG) not considered IO nodes: Not virtualized Run special Linux for booting 7 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  8. L4 and VMM architecture L4 offers generic OS abstractions Guest VM Threads Address spaces Synchronous IPC IPC VMM IPC-based exception / IRQ handling 8 IPC L4 VMM is just a user-level program Receives “VM exit” message from VM Emulates it and replies with an update message L4 virtualization enhancements Empty address spaces Extended VM/thread state handling Internal VM TLB handling 8 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  9. Virtual PowerPC processor VM runs at user mode, Guest VM privileged PPC instructions trap VM Exit VM Entry L4 propagates traps to user-level VMM PC+4 Resume-IPC Exit-IPC R1* PC GPRs* kernel-synthesized IPC VMM R1 State* GPRs State VM/thread state included 9 IPC L4 VMM receives trap IPC VM Exit Decodes message (faulting PC) PC = mtdcr R1 Emulates instruction (e.g. device IO) GPRs State Sends back a reply IPC Upon reception Kernel installs new state Resumes guest 9 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  10. Virtual MMU/TLB PowerPC 450 Guest VM 64-entry TLB AS vaddr paddr sz rwx attr No HW-walked Page Tables A GV  GP Need to virtualize MMU translation VMM A Two levels GP  HP L4 Guest virtual to guest physical AS vaddr paddr sz rwx attr A (guest managed) GV  HP A Guest physical to host physical (L4/VMM managed) Compressed into HW TLB 10 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  11. Virtual MMU/TLB L4 keeps a per-VM “shadow TLB” tlb Guest VM miss Intercepts guest TLB access (tlbwe, tlbre, …) AS vaddr paddr sz rwx attr #pf A Fills shadow TLB accordingly IPC VMM A Stores GV  GP mappings L4 L4 keeps per-AS memory mappings AS vaddr paddr sz rwx attr A Standard L4 memory management A Entry in Stores GP  HP mappings shadow TLB? User-directed, VMM carries out no yes Entry in L4 MM TLB miss handling: Structures? no If guest virtual TLB miss, deliver to guest If guest physical TLB miss, deliver to VMM 11 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  12. Virtual MMU/TLB Protection PowerPC TLB protection features Guest VM User/Kernel bits in TLBs Address Spaces IDs (256) Standard Linux behavior: VMM U/K bits for kernel separation L4 ASIDs for process separation Requirements: PID MSR EA Must virtualize protection TS ASID VADDR PADDR sz rwx U/K Guest code runs at user-level Must support shared mappings Compressed, as for translation Minimize # TLB flushes 12 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  13. Virtual MMU/TLB Protection Krnl Virtualized User (TS=0) L4/VMM Guest VM Use U/K bits and ASIDs TS=1 (TS=1) VM VMM All user-level (no U/K) L4 ASID= 1: Guest Kernel TS=0 ASID= 2: Guest User PID MSR EA ASID= 0: Shared Mappings Analysis TS ASI D U/ K VADDR PADDR sz rwx No TLB flush on guest syscall No TLB flush on VM exit only on guest process or world switches 13 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  14. Virtual Collective Interconnect Collective: IO … compute compute node node 0 node 63 Tree Network, 7.8 Gbit/s, < 6µs latency collective Packet-based, two virtual channels Packet header 16 * 128-bit FPU words payload RX/TX FIFOs TX FIFO TX FIFO RX FIFO RX FIFO OS OS COLL. COLL. 14 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  15. Virtual Collective Interconnect Collective: IO … compute compute node node 0 node 63 Tree Network, 7.8 Gbit/s, < 6 µs latency collective Packet-based, two virtual channels Packet header 16 * 128-bit FPU words payload RX/TX FIFOs TX FIFO TX FIFO RX FIFO Virtualized collective: RX FIFO OS OS TX: vCOLL vCOLL Trap guest channel accesses Issue on physical collective link VMM BUF RX: VMM pCOLL pCOLL Copy GPR/FPU into private buffer Notify guest, then trap vCOLL access 15 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  16. Virtual Torus Interconnect Torus: IO … compute compute torus node node 0 node 63 3D network, 40.8 Gbit/s, 5 µs latency Packet-based, 4 RX/TX groups (Buffer-based) and rDMA dput(2,3) rDMA: … RCV Buffer SND Buffer … direct access by (user) software … … RCV Buffer SND Buffer Memory descriptors rget(0,0) OS OS put/get interface TORUS TORUS (direct-put, remote-get) 16 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  17. Virtual Torus Interconnect Torus: IO … compute compute torus node node 0 node 63 3D network, 40.8 Gbit/s, 5 µs latency Packet-based, 4 RX/TX groups (Buffer-based) and rDMA dput(2,3) rDMA: … RCV Buffer SND Buffer … direct access by (user) software … … RCV Buffer SND Buffer Memory descriptors rget(0,0) OS OS put/get interface vTORUS vTORUS (direct-put, remote-get) GP  HP Virtualized torus model: VMM VMM Trap guest descriptor accesses pTORUS pTORUS Translate guest to host physical Then issue on HW torus 17 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

  18. Status & Initial Evaluation Functionally complete: Virtual PPC core, MMU Linux Linux Virtual torus, tree 2.6 2.6 UP Linux 2.6 guests VMM VMM Virtualized Ethernet L4 L4 (within guest) Initial benchmarks Linux 2.6 Ethernet performance VMM (mapped onto torus) Linux 2.6 L4 Collective much worse still testing/setup problems 18 May 31, 2011 Jan Stoess System Architecture Group Department of Computer Science

Recommend


More recommend