Vhost: Sharing is better Eyal Moscovici Bandan Das Partly - PowerPoint PPT Presentation

Vhost: Sharing is better Eyal Moscovici Bandan Das Partly sponsored by: 1 / 29

What's it about ? ● Paravirtualization: Shared Responsibilities ● Vhost: How much can we stretch ? ● Design Ideas: Parallelization ● Design Ideas: Consolidation ● Vhost: ELVIS ● Upstreaming ● Results ● Wrap up and Questions 2 / 29

Shared Responsibilities ● From Virtualization to Paravirtualization ● Virtio – Host/Guest co-ordination – - Standardized backend/frontend drivers ● Advantages – - Host still has ultimate control (compared to hardware device assignment) – - Security, Fault tolerance, SDN, fjle- based images, replication, snapshots, VM migration ● Disadvantages – - Scalability Limitations 3 / 29

Shared Responsibilities Guest ● Vhost kernel – - Let's move things into Read/Write vCPU Virtio the kernel (almost!) bufgers KVM – - Better irqfd userspace/kernel API Vhost worker – - Avoids system calls, ioeventfd thread improves performance – - And comes with all the Network advantages of virtio Stack 4 / 29

How much can we stretch ? ● One worker thread per virtqueue pair ● More guests = more worker threads – - But is it necessary ? – - Can a worker share responsibilities ? ● Performance will improve (or at least stay the same) – - Main objective: Scalable performance ● No userspace modifjcations should be necessary 5 / 29

Parallelization (Pronunciation Challenge) ● A worker thread running Guest Guest Guest Guest on every CPU core. Tx/Rx Tx/Rx Tx/Rx Tx/Rx ● Guest/Thread mapping is Numa-aware scheduling decoupled. ● Guest serviced by a free worker thread with Vhost-1 Vhost-2 Vhost-3 Vhost-4 NUMA locality CPU0 CPU1 CPU2 CPU3 ● Presented by Shirley Ma at LPC 2012 6 / 29

Parallelization ● But…. - Do we really need “always-on” threads ? ● - is it enough to create threads on demand ? – - Scheduling more complicated when number of guests increase ? ● - Why not share a thread among multiple devices ? 7 / 29

Consolidation - ELVIS (Not the singer) Presented by Abel Gordon at KVM Forum 2013 I/O I/O ● Divide the cores in the system into VM1 VM1 VMi Execution Time Execution Time VM1 VM2 I/O I/O two group: VM cores and I/O cores. VCPU1 VCPU2 VM2 VM2 ● A vhost thread servicing multiple I/O I/O … devices from difgerent guest VMj VMj I/O I/O ● has a dedicated CPU core VM1 VM2 VM2 VM2 I/O I/O VCPU1 VCPU2 ● A user confjgurable parameter VMi VMi VMi determines how many. I/O I/O ● A dedicated I/O scheduler on the Core Core Core 1 Core 1 Core 2 Core 2 Core N Core N vhost thread ● Posted interrupts and polling included! thread-based scheduling fine-grained I/O scheduling 8 / 29

ELVIS Polling Thread ● Single thread in a dedicated core monitors the activity of each queue (VMs I/O) ● Balance between queues based on the I/O activity ● Decide which queue should be processed and for how long ● Balance between throughput and latency ● No process/thread context switches for I/O ● Exitless communication (in the next slides) 9 / 29

ELVIS Polling Thread Traditional Paravirtual I/O guest VCPU Thread I/O notification I/O notification Guest-to-Host Host-to-Guest (Core X) hypervisor I/O Process I/O Complete I/O Thread Request Request hypervisor (Core Y) ELVIS (time) guest VCPU Thread I/O notification I/O notification Guest-to-Host Host-to-Guest (Core X) hypervisor I/O Complete I/O Process I/O Thread Request Request (Core Y) hypervisor Exitless virtual interrupt (time) Polling injection (via ELI) 10 / 29

ELVIS Exitless communication ● Implemented software posted interrupt based on ELI (Exitless interupts) - ELI will be very hard to upstream ● Possible replacements - KVM PV EOI introduced by Michael S. T sirkin – - INTEL VT-d Posted-interrupts (PI) which may be leveraged 11 / 29

Upstreaming.. ● A lot of new ideas! ● First Step – - Stabilize a next generation vhost design. ● The plan: – - Introduce a shared vhost design and run benchmarks with difgerent confjgurations ● - RFC posted upstream ● - Initial test results favorable ● Later enhancements can be introduced gradually... – 12 / 29

Cgroups (Buzzwords, JK ;)) G1 G3 G2 Guest1 Guest1 ● Initial approach CG3 CG1 CG2 – - Add a function to search all cgroups in all hierarchies for the new process. WG3 WG1 WG2 – - Even a single mismatch => create a WG3 WG1 new vhost worker. WG2 WG3 WG1 ● But.. WG2 – - What happens when a VM process is migrated to a difgerent cgroup ? Per Device Vhost Worker – - Can we optimize the cgroup search ? – - What happens if use polling? WG3 – - Rethink cgroups integration ? WG1 WG3 – Shared Vhost Worker 13 / 29

Cgroups and polling ● Can a vhost polling thread poll guests with missmatching cgoups? – - Yes, but it will require the polling thread to take into account cgroup state of the guest. ● Probably requires a deeper integration of vhost and cgroups – – 14 / 29

Workqueues (cmwq) (Even more sharing!) ● Can we use concurrency managed workqueues ? ● NUMA awareness comes free! ● But wait, what about cgroups ? – - No cgroups support (at least yet, WIP) ● Less code to manage, less bugs. ● Cons- – - Minimal control once work enters the workqueue – - Again, no cgroups support :( – 15 / 29 –

Results ● ELVIS results – - A little old but signifjcant – - Includes testing for Exit Less Interrupts, Polling ● - Valuable data for future work ● Setup – - Linux Kernel 3.1 – - IBM System x3550 M4, two 8-cores sockets of Intel Xeon E5-2660, 2.2 GHz, 56GB RAM – and with an Intel x520 dual port 10Gbps – - QEMU 0.14 ● Results showing the performance impact of the difgerent components of ELVIS – - Throughput: Netperf TCP stream w. 64 byte messages – - Latency: Netperf UDP RR 16 / 29

Results – Performance (Netperf) netperf tcp stream netperf udp rr 10 80.00 elvis-poll-pi 70.00 elvis-poll 8 elvis 60.00 Throughput (Gbps) baseline 6 50.00 latency (msec) baseline-affinity 40.00 4 baseline 30.00 elvis 20.00 2 elvis-poll 10.00 elvis-poll-pi 0 0.00 1 2 3 4 5 6 7 1 2 3 4 5 6 7 # VMs # vms 17 / 29

Results – Components of ELVIS netperf tcp stream netperf udp rr 1.4 1.05 elvis-poll-pi elvis-poll 1.3 1.00 elvis Relative throughput 1.2 0.95 relative latency 1.1 0.90 1.0 0.85 elvis elvis-poll 0.9 0.80 elvis-poll-pi 0.75 0.8 1 2 3 4 5 6 7 1 2 3 4 5 6 7 # vms # VMs 18 / 29

Even more Results ● New results with RFC patches – - T wo systems with Xeon E5-2640 v3 – - Point to point network connection – - Netperf TCP throughput (STREAM & MAERTS) – - Netperf TCP Request Response 19 / 29

Results 20 / 29

Results 21 / 29

So, ship it ?! ● Not yet :) ● Slowly making progress towards a acceptable solution ● Scope for a lot of interesting work Questions/Comments/Suggestions ? 22 / 29

Backup 23 / 29

ELVIS missing piece ● Polling on the physical NIC - It may be possible to use low-latency Ethernet device polling introduced in kernel 3.11 ● * I have an ELVIS version polling the physical NIC that is not using this patch 24 / 29

Results – Performance (Netperf) netperf tcp stream netperf udp rr 10 80.00 elvis-poll-pi 70.00 elvis-poll 8 elvis 60.00 Throughput (Gbps) baseline 6 50.00 latency (msec) baseline-affinity 40.00 4 baseline 30.00 elvis 20.00 2 elvis-poll 10.00 elvis-poll-pi 0 0.00 1 2 3 4 5 6 7 1 2 3 4 5 6 7 # VMs # vms 25 / 29

Results – Performance (Netperf) Different message sizes require different number of IO cores ● Using sidecores is beneficial in a wide range of message sizes ● The number of VMs “doesn't matter” for throughput ● 26 / 29

Results – Performance (Netperf TCP RR) One I/O side core is not enough, two is needed ● sidecore performs up to x1.5 better then Baseline ● 27 / 29

Results – Performance (memcached) One I/O side core is not enough, two is needed ● sidecore performs up to > x2 better then Baseline ● 28 / 29

Results – Performance (apachebench) One I/O side core is not enough, two is needed ● sidecore performs up to x2 better then Baseline ● 29 / 29

Vhost: Sharing is better Eyal Moscovici Bandan Das Partly - PowerPoint PPT Presentation

Vhost: Sharing is better Eyal Moscovici Bandan Das Partly sponsored by: 1 / 29 What's it about ? Paravirtualization: Shared Responsibilities Vhost: How much can we stretch ? Design Ideas: Parallelization Design Ideas:

Configuring and Benchmarking Open vSwitch, DPDK and vhost-user Pei Zhang ( )

VIRTIO-NET: VHOST DATA PATH ACCELERATION TORWARDS NFV CLOUD CUNMING LIANG, Intel Agenda

Secret Sharing and Visual Cryptography Outline Secret Sharing Visual Secret Sharing

ROCKBOX FABRIQ EDITION ITS TIME FOR FOR BETTER SOUND. BETTER DESIGN. BETTER SPECS.

Vhost and VIOMMU Jason Wang <jasowang@redhat.com> (Wei Xu <wexu@redhat.com>) Peter

Advanced Tools from Modern Cryptography Lecture 3 Secret-Sharing (ctd.) Secret-Sharing Last

Better Advice, Better Lives Adults Select Committee 21 st June Usk 1 Better Advice, Better Lives

Architecture Research On Transport Information Services of EXPO 2010 Shanghai China Better City,

ESCRI-SA Knowledge Sharing Sharing Objectives and Components A presentation for the ESCRI-SA

Introductory Webinar Better Care, Better Health, Better Value A Better Rehabilitative Care System

Better health Better health Better health Better health for Europe: for Europe: p equitable

BETTER BART BETTER BAY AREA BETT BETTER ER BAR ART T / / BETT BETTER ER BAY Y AREA AREA

Sharing a culture, sharing a passion, sharing a pleasure in Myanmar JM Company Ltd wine

Benita Matofska Sharing Economy Expert Comparison marketplace for the Sharing Economy What We

Spectrum Sharing in Cognitive Radio Networks By: H.Feizresan Summer 2009 1 Spectrum sharing in

THE SHARING ECONOMY CRAMO GROUP DRIVING THE SHARING ECONOMY CRAMO GROUP Our purpose is to

1 The RES ESUL ULTS TS Un Unite ted d Na Nati tiona onal l Webin binar ar Welcom

Advanced Air Mobility Ecosystem Crosscutting Working Group Virtual Kickoff June 2020

EDA045F: Program Analysis LECTURE 5 BONUS: BASIC CALLGRAPHS Christoph Reichenbach The Call Graph

VPN Virtual Private Network zswu Computer Center, CS, NCTU Introduction Uses public

Process Abstraction Physical Hardware Instruction Memory I/O Devices Set Registers MMU

Instant Virtual Machine Recovery using Vembu VMBackup About Vembu Technologies Founded in 200 2

Virtual Machines for ROC: Initial Impressions Pete Broadwell pbwell@cs.berkeley.edu Talk

CSCE 613: Virtualization ! [ ] " Overview ! [13] " Gerald J. Popek and Robert P.

Sambuz

Useful Links

Newsletter

Mail Us

Vhost: Sharing is better Eyal Moscovici Bandan Das Partly - PowerPoint PPT Presentation

Vhost: Sharing is better Eyal Moscovici Bandan Das Partly sponsored by: 1 / 29 What's it about ? Paravirtualization: Shared Responsibilities Vhost: How much can we stretch ? Design Ideas: Parallelization Design Ideas:

Configuring and Benchmarking Open vSwitch, DPDK and vhost-user Pei Zhang ( )

VIRTIO-NET: VHOST DATA PATH ACCELERATION TORWARDS NFV CLOUD CUNMING LIANG, Intel Agenda

Secret Sharing and Visual Cryptography Outline Secret Sharing Visual Secret Sharing

ROCKBOX FABRIQ EDITION ITS TIME FOR FOR BETTER SOUND. BETTER DESIGN. BETTER SPECS.

Vhost and VIOMMU Jason Wang &lt;jasowang@redhat.com&gt; (Wei Xu &lt;wexu@redhat.com&gt;) Peter

Advanced Tools from Modern Cryptography Lecture 3 Secret-Sharing (ctd.) Secret-Sharing Last

Better Advice, Better Lives Adults Select Committee 21 st June Usk 1 Better Advice, Better Lives

Architecture Research On Transport Information Services of EXPO 2010 Shanghai China Better City,

ESCRI-SA Knowledge Sharing Sharing Objectives and Components A presentation for the ESCRI-SA

Introductory Webinar Better Care, Better Health, Better Value A Better Rehabilitative Care System

Better health Better health Better health Better health for Europe: for Europe: p equitable

BETTER BART BETTER BAY AREA BETT BETTER ER BAR ART T / / BETT BETTER ER BAY Y AREA AREA

Sharing a culture, sharing a passion, sharing a pleasure in Myanmar JM Company Ltd wine

Benita Matofska Sharing Economy Expert Comparison marketplace for the Sharing Economy What We

Spectrum Sharing in Cognitive Radio Networks By: H.Feizresan Summer 2009 1 Spectrum sharing in

THE SHARING ECONOMY CRAMO GROUP DRIVING THE SHARING ECONOMY CRAMO GROUP Our purpose is to

1 The RES ESUL ULTS TS Un Unite ted d Na Nati tiona onal l Webin binar ar Welcom

Advanced Air Mobility Ecosystem Crosscutting Working Group Virtual Kickoff June 2020

EDA045F: Program Analysis LECTURE 5 BONUS: BASIC CALLGRAPHS Christoph Reichenbach The Call Graph

VPN Virtual Private Network zswu Computer Center, CS, NCTU Introduction Uses public

Process Abstraction Physical Hardware Instruction Memory I/O Devices Set Registers MMU

Instant Virtual Machine Recovery using Vembu VMBackup About Vembu Technologies Founded in 200 2

Virtual Machines for ROC: Initial Impressions Pete Broadwell pbwell@cs.berkeley.edu Talk

CSCE 613: Virtualization ! [ ] &quot; Overview ! [13] &quot; Gerald J. Popek and Robert P.

Sambuz

Useful Links

Newsletter

Mail Us

Vhost and VIOMMU Jason Wang <jasowang@redhat.com> (Wei Xu <wexu@redhat.com>) Peter

CSCE 613: Virtualization ! [ ] " Overview ! [13] " Gerald J. Popek and Robert P.