vhost and viommu
play

Vhost and VIOMMU Jason Wang <jasowang@redhat.com> (Wei Xu - PowerPoint PPT Presentation

Vhost and VIOMMU Jason Wang <jasowang@redhat.com> (Wei Xu <wexu@redhat.com>) Peter Xu <peterx@redhat.com> Agenda IOMMU & Qemu vIOMMU background Motivation of secure virtio DMAR (DMA Remapping) Design


  1. Vhost and VIOMMU Jason Wang <jasowang@redhat.com> (Wei Xu <wexu@redhat.com>) Peter Xu <peterx@redhat.com>

  2. Agenda IOMMU & Qemu vIOMMU background ● Motivation of secure virtio ● DMAR (DMA Remapping) ● – Design Overview – Implementation illustration – Performance optimization – vhost device iotlb IR (Interrupt Remapping) ● Performance results & status ● 08/18/16 VHOST AND VIOMMU 2

  3. IOMMU & Qemu vIOMMU Revisit What is IOMMU? ● – A hardware component provides two main functions: IO Translation and Device Isolation. How IO Translation and Device Isolation are supported by IOMMU ● – DMA Remapping(DMAR), IO space address presented by devices are translated to physical address coupled with access permission on the fly, so the ability of devices are limited to access specific regions of memory. – Interrupt Remapping (IR), Some architectures also support interrupt remapping, in a manner similar to memory remapping. What's qemu vIOMMU? ● – An emulated IOMMU which behaves as a real one. – The functionality is always a subset of a physical unit depending on implementation. – Only Intel, ppc, sun4m iommus are support in qemu currently. 08/18/16 VHOST AND VIOMMU 3

  4. IOMMU and vIOMMU HOST VM VM Memory Memory vIOMMU vMMU vIOMMU vMMU Emulated Emulated vCPU vCPU Devices Devices Host Memory IOMMU MMU CPU Hardware Devices 08/18/16 VHOST AND VIOMMU 4

  5. Motivation ● Security, Securtiy and security. ● DPDK: The Userspace Polling-Mode drivers (DPDK) for virtio net devices are vastly used in NFV. ● Vhost is the popular backend for most of user cases. ● Vhost is still out of IOMMU scope. 08/18/16 VHOST AND VIOMMU 5

  6. DMA Remapping (DMAR)

  7. Virtio-Net Device Address Space Overview Guest Qemu gpa Virtio-Net Memory API Virtio-Net Guest pages Backend Service gpa-to-hva gpa Vhost-net Vhost-user tx/rx Other virtio-net backends Vring Virtio-Net Backends 08/18/16 VHOST AND VIOMMU 7

  8. Design of Secure Virtio-Net Device Driver Guest Qemu iova Memory API Virtio-Net Virtio-Net Backend Service dma api iotlb entry lookup Guest Pages IOMMU Driver vIOMMU IOTLB API iova-to-hva iova Vhost-net Vhost-user tx/rx Other virtio-net backends Vring Virtio-Net Backends 08/18/16 VHOST AND VIOMMU 8

  9. Implementation: Guest Guest ● – Boot guest with a vIOMMU assigned. – VIRTIO_F_IOMMU_PLATFORM, if this feature bit is provided in the device, then the guest virtio driver is forced to use dma api to manage all corresponding dma memory access, otherwise the device will be disabled by system compulsorily. 08/18/16 VHOST AND VIOMMU 9

  10. Implementation: Qemu and Backends Qemu ● – DMA address translation for vIOMMU has been fully supported, unfortunately, virtio-pci devices is still using memory address space and never use iova at all, switch to use dma address(iova). Backends ● – All address access to vring must be translated from guest iova to hva, this is done via iotlb lookup with interfering of vIOMMU. 08/18/16 VHOST AND VIOMMU 10

  11. More optimization: Vhost Device IOTLB Cache Why it comes to vhost? ● – Vhost-net is the most powerful and reliable in-kernel network backend, and is widely used as a preferred backend. What problem does vhost encounter? ● – IOTLB api of vIOMMU is implemented in qemu, while vhost works in kernel, high frequency of iotlb translations which traverse between kernel and userspace will impact performance dramatically. How does vhost survive? ● – Kernel-Side device iotlb cache(ATS). 08/18/16 VHOST AND VIOMMU 11

  12. Address Translation Services(ATS) Overview Memory Translation Agent (TA) ats request Root Complex ats completion PCIe Device B PCIe Device A device iotlb cache 08/18/16 VHOST AND VIOMMU 12

  13. Why Address Translation Services(ATS)? Alternative ● – An individual VT-d in vhost, drawbacks: ● Code duplication. ● Vendor and architecture specific. ● New api for error reporting. Benefits of ATS ● – PCIe spec – Platform independent. – Easily achieved based on current iommu infrastructure. 08/18/16 VHOST AND VIOMMU 13

  14. Vhost Device IOTLB Cache Workflow legal address range guest unmap 'c' illegal address range Qemu IOTLB API new error report (d, size, wo) lookup update 'd' Vhost IOTLB API iotlb-update 'd' iotlb-miss 'd' iotlb invalidate 'c' (a, size, ro) translate iova 'd' (c, size, rw) a (b, size, wo) Vhost Tx/Rx (d, size, wo) Vring device iotble cache entries interval tree 08/18/16 VHOST AND VIOMMU 14

  15. Vhost Device IOTLB Implementation Summary Implementation ● - Save device iotlb cache entries in kernel. - Lookup entry from the cache when accessing virtio buffers. - Request qemu to translate for any tlb miss on demand. - Process update/invalidate message from qemu and manage the kernel cache correctly. ● Data Structure and Userspace/Kernel Interface - An interval tree is chosen to save the dynamica device iotlb caches. - A message mechanism via vhost 'fd' read/write is used to pass vATS request and reply. 08/18/16 VHOST AND VIOMMU 15

  16. Interrupt Remapping (IR)

  17. X86 system interrupts Processor Processor Processor ... Local APIC Local APIC Local APIC System Bus Bridge Kinds of interrupts: PCI Bus – Line-based (edge/level) – Signal-based (MSI/MSI-X) Signal-based IOAPIC IRQ chips Interrupts (MSI/MSIX) – IOAPIC – Local APICs (LAPICs) Line-based Interrupts 08/18/16 VHOST AND VIOMMU 17

  18. IR challenges for vhost Interrupt remapping (IR) still not supported for x86 vIOMMU ● – MSI and IOAPIC interrupts Kernel irqchip support: ● – How to define interface between user and kernel space? – How to enable vhost fast irq path (irqfd)? Performance impact? ● Interrupt caching ● 08/18/16 VHOST AND VIOMMU 18

  19. IOAPIC interrupt delivery Workflow before IR: ● – Fill in IOAPIC entry with interrupt information (trigger mode, destination ID, destination mode, etc.). – When line triggered, interrupt sent to CPU with information stored in IOAPIC entry . Workflow after IR (IRTE: Interrupt Remapping Table Entry) : ● – Fill in IRTE with interrupt information (in system memory). – Fill in IOAPIC entry with IRTE index. – When line triggered, fetch IRTE index from IOAPIC entry , send the interrupt with information stored in specific IRTE . 08/18/16 VHOST AND VIOMMU 19

  20. MSI/MSI-X delivery Delivered Interrrupt Request (MSI) MSI Delivery without IR Interrupt Remapping T able IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE IRTE Indexing Lookup Parse Delivered Interrrupt Request Interrrupt Remapping Interrrupt Request (MSI with IR) T able Entry (IRTE) (MSI) MSI Delivery with IR 08/18/16 VHOST AND VIOMMU 20

  21. IR with kernel-irqchip We want interrupts “as fast as before”. ● Current implementation: ● – Leverage existing GSI routing table in KVM – Instead of translate “on the fly”, translate during setup – Easy to implement (no KVM change required) – Little performance impact (slow setup, fast delivery) – Only support “split|off” kernel irqchip, not “on” 08/18/16 VHOST AND VIOMMU 21

  22. Remap irqfd interrupts Fast IRQ path for vhost devices: without remapping ● GSI Routing T able MSI Message 1 Setup QEMU MSI Message 2 MSI Message 3 Setup MSI Message 4 vhost KVM Guest Guest IRQ Event Notifjer injection 08/18/16 VHOST AND VIOMMU 22

  23. Remap irqfd interrupts (cont.) Fast IRQ path for vhost devices: with remapping ● GSI Routing T able T ranslated Setup MSI Message 1 QEMU T ranslated MSI Message 2 T ranslated MSI Message 3 Setup T ranslated MSI Message 4 vhost KVM Guest Guest IRQ Event Notifjer injection 08/18/16 VHOST AND VIOMMU 23

  24. All in all... To boot guest with DMAR and IR enabled: ● (Possibly one extra flag to enable DMAR for guest virtio driver) qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split \ qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split \ -device intel-iommu,intremap=on \ -device intel-iommu,intremap=on \ -netdev tap,id=tap1,script=no,downscript=no,vhost=on \ -netdev tap,id=tap1,script=no,downscript=no,vhost=on \ -device virtio-net-pci,netdev=tap1,disable-modern=off,ats=on -device virtio-net-pci,netdev=tap1,disable-modern=off,ats=on 08/18/16 VHOST AND VIOMMU 24

  25. Vhost + VIOMMU Performance For dynamic DMA mapping (e.g., using generic Linux kernel drivers): ● – Performance dropped drastically → – TCP_STREAM: 24500 Mbps 600 Mbps → – TCP_RR: 25000 trans/s 11600 trans/s For static DMA mapping (e.g., DPDK based application like l2fwd) ● – Around 5% performance drop for throughput (pktgen) – Still more work TBD... 08/18/16 VHOST AND VIOMMU 25

  26. Current status & TBDs DMAR/IR upstream status: ● – QEMU: IR merged (Peter Xu), DMAR still RFC (Jason Wang will post formal patch soon) – Vhost & Virtio driver: merged (Michael S. Tsirkin/Jason Wang) – DPDK: vhost-user IOTLB is being developed (Victor Kaplansky) TBDs ● – Performance tuning for DMAR – Quite a few enhancements for IR: explicit cache invalidations, better error handling, etc. 08/18/16 VHOST AND VIOMMU 26

  27. Thanks! 08/18/16 VHOST AND VIOMMU 27

  28. Appendix 08/18/16 VHOST AND VIOMMU 28

More recommend