Fido: Fast Inter-Virtual-Machine Communication for Enterprise Appliances Anton Burtsev † , Kiran Srinivasan, Prashanth Radhakrishnan, Lakshmi N. Bairavasundaram, Kaladhar Voruganti, Garth R. Goodson NetApp, Inc † University of Utah, School of Computing
Enterprise appliances Network attached storage, routers, etc. • High performance • Scalable and highly-available access 2
Example Appliance • Monolithic kernel • Kernel components Problems: • Fault isolation • Performance isolation • Resource provisioning 3
Split architecture 4
Benefits of virtualization • High availability • Fault-isolation • Micro-reboots • Partial functionality in case of failure • Performance isolation • Resource allocation • Consolidation and load balancing, VM migration • Non-disruptive updates • Hardware upgrades via VM migration • Software updates as micro-reboots • Computation to data migration 5
Main Problem: Performance Is it possible to match performance of a monolithic environment? • Large amount of data movement between components • Mostly cross-core • Connection oriented (established once) • Throughput optimized (asynchronous) • Coarse grained (no one-word messages) • Multi-stage data processing • Main cost contributors • Transitions to hypervisor • Memory map/copy operations • Not VM context switches (multi-cores) • Not IPC marshaling 6
Main Insight: Relaxed Trust Model • Appliance is built by a single organization • Components: • Pre-tested and qualified • Collaborative and non-malicious • Share memory read-only across VMs! • Fast inter-VM communication • Exchange only pointers to data • No hypervisor calls (only cross-core notification) • No memory map/copy operations • Zero-copy across entire appliance 7
Contributions • Fast inter-VM communication mechanism • Abstraction of a single address space for traditional systems • Case study • Realistic microkernelized network attached storage 8
Design 9
Design Goals • Performance • High-throughput • Practicality • Minimal guest system and hypervisor dependencies • No intrusive guest kernel changes • Generality • Support for different communication mechanisms in the guest system 10
Transitive Zero Copy • Goal • Zero-copy across entire appliance • No changes to guest kernel • Observation • Multi-stage data processing 11
Pseudo Global Virtual Address Space 2 64 Insight: • CPUs support 64-bit address space • Individual VMs have no need in it 0 12
Pseudo Global Virtual Address Space 2 64 0 13
Pseudo Global Virtual Address Space 2 64 0 14
Transitive Zero Copy 15
Fido: High-level View 16
Fido: High-level View • “c” – connection management • “m” – memory mapping • “s” – cross-VM signaling 17
IPC Organization • Shared memory ring • Pointers to data 18
IPC Organization • Shared memory ring • Pointers to data • For complex data structures • Scatter-gather array 19
IPC Organization • Shared memory ring • Pointers to data • For complex data structures • Scatter-gather array • Translate pointers 20
IPC Organization • Shared memory ring • Pointers to data • For complex data structures • Scatter-gather array • Translate pointers • Signaling: • Cross-core interrupts (event channels) • Batching and in-ring polling 21
Fast device-level communication • MMNet • Link-level • Standard network device interface • Supports full transitive zero-copy • MMBlk • Block-level • Standard block device interface • Zero-copy on write • Incurs one copy on read 22
Evaluation 23
MMNet Evaluation Loop NetFront XenLoop MMNet • AMD Opteron with 2 2.1GHz 4-core CPUs (8 cores total) • 16GB RAM • NVidia 1Gbps NICs • 64-bit Xen (3.2), 64-bit Linux (2.6.18.8) • Netperf benchmark (2.4.4) 24
MMNet: TCP Throughput 12000 10000 Throughput (Mbps) 8000 Monolithic 6000 Netfront XenLoop 4000 MMNet 2000 0 0.5 1 2 4 8 16 32 64 128 256 Message Size (KB) 25
MMBlk Evaluation Monolithic XenBlk MMNet • Same hardware • AMD Opteron with 2 2.1GHz 4-core CPUs (8 cores total) • 16GB Ram • NVidia 1Gbps NICs • VMs are configured with 4GB and 1GB RAM • 3 GB in-memory file system (TMPFS) • IOZone benchmark 26
MMBlk Sequential Writes 600 500 Throughput (MB/s) 400 300 Monolithic XenBlk 200 MMBlk 100 0 4 8 16 32 64 128 256 512 1K 2K 4K Record Size (KB) 27
Case Study 28
Network-attached Storage 29
Network-attached Storage • RAM • VMs have 1GB each, except FS VM (4GB) • Monolithic system has 7GB RAM • Disks : • RAID5 over 3 64MB/s disks • Benchmark • IOZone reads/writes 8GB file over NFS (async) 30
Sequential Writes 90 80 70 Throughput (MB/s) 60 50 Monolithic 40 Native-Xen 30 MM-Xen 20 10 0 4 8 16 32 64 128 256 512 1K 2K 4K Record Size (KB) 31
Sequential Reads 80 70 60 Throughput (MB/s) 50 40 Monolithic Native-Xen 30 MM-Xen 20 10 0 4 8 16 32 64 128 256 512 1K 2K 4K Record Size (KB) 32
TPC-C (On-Line Transactional Processing) 350 300 Transactions/minute (tpmC) 250 Monolithic 200 MMXen 150 Native-Xen 100 50 0 33
Conclusions • We match monolithic performance • “ Microkernelization ” of traditional systems is possible! • Fast inter-VM communication • The search for VM communication mechanisms is not over • Important aspects of design • Trust model • VM as a library (for example, FSVA) • End-to-end zero copy • Pseudo Global Virtual Address Space • There are still problems to solve • Full end-to-end zero copy • Cross-VM memory management • Full utilization of pipelined parallelism 34
Thank you. aburtsev@flux.utah.edu 35
Backup Slides 36
Related Work • Traditional microkernels [L4, Eros, CoyotOS] • Synchronous (effectively thread migration) • Optimized for single-CPU, fast context switch, small messages (often in registers), efficient marshaling (IDL) • Buffer management [Fbufs, IOLite, Beltway Buffers] • Shared buffer is a unit of protection • Fast-forward – fast cache-to-cache data transfer • VMs [Xen split drivers, XWay, XenSocket, XenLoop] • Page flipping, later buffer sharing • IVC, VMCI • Language-based protection [Singularity] • Shared heap, zero-copy (only pointer transfer) • Hardware acceleration [Solarflare] • Multi-core OSes [Barrelfish, Corey, FOS] 37
Recommend
More recommend