Fall 2014 :: CSE 506 :: Section 2 (PhD) Virtual Machines Heyi Li and Zhen Cao (Some of the figures are from the Internet)
Fall 2014 :: CSE 506 :: Section 2 (PhD) Outline • Basic concepts • When virtual is better • Implementation • When virtual is harder
Fall 2014 :: CSE 506 :: Section 2 (PhD) Basic Concepts Type 1 (bare-metal) • What is a virtual machine? VM1 VM2 Guest – An emulation of a particular computer system Hypervisor • System VM vs. Process VM Host Hardware – System VM: supports the execution of a complete OS (Xen) VMware ESX, Microsoft Hyper-V, Xen – Process VM: supports the execution of a single Type 2 (hosted) process (JVM) VM1 VM2 Guest • Hypervisor (VMM) Process Hypervisor Hosting OS Host – Computer software that creates and runs VMs Hardware • Type I & II Hypervisor VMware Workstation, Microsoft Virtual PC, Sun VirtualBox, QEMU, KVM
Fall 2014 :: CSE 506 :: Section 2 (PhD) Applications and Benefits Server Consolidation Test and Development VM1 VM1 VMn VMn VM1 … … App App App App App App OS OS OS OS OS OS VMM VMM HWn HW0 HW HW • Energy efficiency • Rapid deployment • Security • Reducing Maintenance costs
Fall 2014 :: CSE 506 :: Section 2 (PhD) Virtualization Requirements • Fidelity – Software on the VM executes identically to its execution on hardware, barring time effects • Performance – Performance overhead must be small • Safety – The VMM manages all hardware resources
Fall 2014 :: CSE 506 :: Section 2 (PhD) Obstacles for X86 • Trap-and-emulate – All virtualization-sensitive instructions are also privileged instructions • x86 architecture once thought to be not fully virtualizable – Certain privileged instructions behave differently when run in unprivileged mode (POPF) – Certain unprivileged instructions can access privileged state (SGDT) • Techniques to address inability to virtualize x86 – Full virtualization w/o hardware support – Binary Translation (VMware ESX) – Paravirtualization (Xen) – Hardware-assisted virtualization
Fall 2014 :: CSE 506 :: Section 2 (PhD) Binary Translation
Fall 2014 :: CSE 506 :: Section 2 (PhD) Binary Translation • Binary: input is binary x86 code, not source code • On-the-fly: dynamic and on demand • Only need to translate kernel mode code – User mode: direct execution • Even for kernel mode, most instruction sequences don’t change • Instructions that do change: – Indirect control flow: call/ret, jmp – PC-relative addressing – Privileged instructions
Fall 2014 :: CSE 506 :: Section 2 (PhD) Hash Table ([x], [y]) Translation Cache 3 PC [x] [y] Execute Binary Translator 2 4 1 TU CCF 5 1. A translation unit stops at 12 instructions 3. Track the translation cache with a hash table or a control-flow instruction 4. Execute the CCF 2. Translated into Compiled Code 5. Continuation (either fall-through or taken- branch) Fragments(CCF) and cached
Fall 2014 :: CSE 506 :: Section 2 (PhD) Memory 0 4GB Guest Virtual Address (gVA) Space Guest Page Table (Visible to guest OS) 0 4GB Shadow Page Table Guest Physical Address (gPA) Space (Resides in hardware and maintained by VMM PhysMap (Pmap) (Maintained by VMM) VMM) 0 4GB Host Physical Address (hPA) Space
Fall 2014 :: CSE 506 :: Section 2 (PhD) Shadow Page Tables • Translation from gVA to hPA directly by hardware • If not present, page fault generated by hardware • Hidden page fault : the mapping present in guest page table – VMM walks the guest page table to determine the gPA backing that gVA – VMM allocates a physical page, and adds the mapping to Pmap – Updates the shadow page table • True page fault : the mapping not present in guest page table – VMM generates an exception on the virtual cpu – Resume executing on the first instruction of the guest exception handler
Fall 2014 :: CSE 506 :: Section 2 (PhD) I/O Virtualization – Direct I/O Model • Place drivers for high-performance I/O Full Virtualization devices directly into hypervisor VM 0 VM n • Not attempt to have the virtual hardware Guest OS Guest OS match the specific underlying hardware and Apps and Apps • Virtualize selected, canonical I/O devices I/O Services • Problems Device Drivers – Larger Hypervisor Hypervisor – Need to protect hypervisor from driver faults Shared Devices
Fall 2014 :: CSE 506 :: Section 2 (PhD) Paravirtualization
Fall 2014 :: CSE 506 :: Section 2 (PhD) CPU Virtualization • Privilege levels in x86 – Ring 0: Xen – Ring 1: guest OS – Ring 3: user apps • Isolation – Guest user mode and guest kernel mode • Page table “supervisor” bit: PTE_U – Guest OS and VMM • Segmentation – Problem with x86-64
Fall 2014 :: CSE 506 :: Section 2 (PhD) CPU Virtualization (cont.) • Privileged instructions – Hypercalls – Modify source codes – Validated and executed by Xen (e.g., installing a new PT) • Exceptions – Registered with Xen once. Accepted (validated) if don’t require to execute exception handlers in ring0. – Called directly without Xen intervention – All syscalls from apps to guest OS handled this way (and executed in ring1) • Page fault handlers are special – Faulting address can be read only in ring 0 – Xen reads the faulting address and passes it via stack to the OS handler in ring1
Fall 2014 :: CSE 506 :: Section 2 (PhD) Memory Virtualization • Physical memory – At domain creation, hardware pages “reserved” – Domain can increase/decrease its quota – Xen does not guarantee that the hardware pages are contiguous • Virtual memory – Register guest OS page tables directly with MMU – Guest OS allocates and initializes a page from its own memory reservation and registers it with Xen • Every guest OS has its own address space • Xen occupies top 64MB of every address space. • To save switching costs between address spaces (hypervisor calls) – Xen involved only in memory updates
Fall 2014 :: CSE 506 :: Section 2 (PhD) I/O Virtualization – Indirect I/O Model • Uses a privileged virtual Paravirtualization machine (Domain0) for all Guest VMs Service VMs device drivers VM n • Simple interfaces for guest OSes I/O Services VM 0 • Pros Device Drivers Guest OS – higher security and Apps • Cons Hypervisor – lower performance Shared Devices
Fall 2014 :: CSE 506 :: Section 2 (PhD) Hardware-assist Virtualization (HVM)
Fall 2014 :: CSE 506 :: Section 2 (PhD) Intel’s VT -x Virtual Machines (VMs) • More-privileged mode for VMM Apps Apps Ring 3 • Less-privileged mode for guest OS OS OS Ring 0 • Eliminate de-privileging of Ring VM Exit VM Entry for guest OS VMX VM Monitor (VMM) Root
Fall 2014 :: CSE 506 :: Section 2 (PhD) VM Control Structure(VMCS) • Execution controls determine when exits occur – Access to privileged state, occurrence of exceptions, etc. – Flexibility provided to avoid unwanted exits • Guest-state area – Processor state saved into the guest-state area on VM exits and loaded on VM entries • Host-state area – Processor state loaded from the host-state area on VM exits • Other
Fall 2014 :: CSE 506 :: Section 2 (PhD) Extended Page Table(EPT) CR3 EPT Base Pointer (EPTP) Host Physical Address Guest Extended Guest Physical Address Guest Linear Address Page Page Tables Tables • A new page-table structure, under the control of the VMM – Defines mapping between GPA & HPA – EPT base pointer (new VMCS field) points to the EPT page tables – EPT (optionally) activated on VM entry, deactivated on VM exit • Guest has full control over its own IA-32 page tables – No VM exits due to guest page faults, INVLPG, or CR3 changes
Fall 2014 :: CSE 506 :: Section 2 (PhD) I/O Virtualization Full Virtualization Paravirtualization Pass-through Model Guest VMs Service VMs VM 0 VM n VM 0 VM n VM n I/O Guest OS Guest OS Guest OS Guest OS Services and Apps and Apps VM 0 and Apps and Apps Device Device Device Guest OS Drivers Drivers I/O Services Drivers and Apps Device Drivers Hypervisor Hypervisor Hypervisor Assigned Shared Shared Devices Devices Devices
Fall 2014 :: CSE 506 :: Section 2 (PhD) IOMMU • Device pass through – Directly assign a physical device to a particular guest OS – Address space translation handled transparently • Device isolation – Safely map a device to a particular guest without risking the integrity of other guests
Fall 2014 :: CSE 506 :: Section 2 (PhD) IOMMU • Translation Control Entry – Translation from a DMA address to a host memory address
Fall 2014 :: CSE 506 :: Section 2 (PhD) Security Problems • Transience – Large numbers of machines appear and disappear from the network sporadically • Diversity – Long and painful upgrade cycles • Identity – Difficult to establish who owns a VM running on a particular physical host • Mobility – Can be easily copied over a network or carried on portable storage media
Fall 2014 :: CSE 506 :: Section 2 (PhD) Discussion
Fall 2014 :: CSE 506 :: Section 2 (PhD) Thanks!
Recommend
More recommend