1111 Live Migration @Alibaba Cloud: issues settled & challenges remain Chao Zhang Email: zhuoxi.zc@alibaba-inc.com
1111 1 2 4 3 Challenges of Live LM Application@ Performance Tuning Future challenges Alibaba Cloud Migration @Alibaba Cloud & Robust Improvements
1111 Traditional Live Migration in Virtualization MEM pre-copy last-copy VM State Save init Cleanup SRC Storage Shutdown Network Shutdown VM Start MEM pre-copy 内存pre-copy last-copy VM State Restore Reservation DST Storage Reopen Network reconnect VM Running on SRC Host VM Downtime VM Breaktime VM Running on DST HOST
1111 Challenges of Live Migration @Alibaba Cloud SLB • Require transparent migration to Cloud Security Services the whole cloud system • Hardware & Software backward VM Control Virtuali compatibility System zation • Robust of live migration Cloud VPC Disk • Why/When/Which? Not Just a Virtualized Instance Migrating
1111 Migration Operations Required @Alibaba Cloud Start Migration VM Pause Storage RD_ONLY Migration Notify Storage Reopen Storage Pause Last Copy Preparation Relay Forwarding Install Flow Rules Device Relocation VM Status Manager Session Copy Network Switch VM-NC Switch SRC VM destroy Status Notify VM Start VM configuration Control Plane Virtualization Plane Other Cloud Services
1111 Decoupling Migration by Define Status Entrance Standard Control System Storage Network Virtualizaton Start Migration Migration Migration Migration Preparation Prepare Prepare Pre MEM COPY Flow Rules Install RD_only Open Pre Migration Status Notify SESSION copy VM Pause Post Migration SRC PAUSE Relay Forwarding Last Copy SESSION Last Copy VM network Switch VM Start Status Notify Reopen VM-NC switch Session ReCreate VM Start Resource Cleanup Status Notify SRC VM destroy
1111 Optimization of Live Migration in Virtualization SESSION COPY • Critical path parallelism Pre Heavy Operation • Dismantling heavy operations Critical Path BDRV flush • Rearrangement: Lazy/Pre BDRV Flush SESSION Copy • Downwards time-sensitive Add Pre Last Copy Storage Reopen operation from control system Relay Forwarding VM Last Copy Compression to virtualization plane Lazy Heavy Operation
1111 Cloud Disk Optimization Open Fd by RD_ONLY Open Fd by RD_ONLY SRC:(1)Pause Fd SRC: Close Fd • Critical Path Optimization • Light Weight Pause DST: ReOpen Fd DST: ReOpen Fd Operation Critical Path Critical Path SRC: (2)Destroy Fd Pre Optimization After Optimization
1111 VPC/SDN Live Migration Network Manager Install Flow Rules • Copy SESSION table VM1 VM2 • Relay Forwarding switch switch • SESSION table update Relay Forwarding VM1` switch SESSION table
1111 Add-on Cloud Services Stay Intact Cloud Service DPDK • Indirect VM-Host relationship …… • Direct VM-Host relationship NIC (1) • Live Migration friendly cloud ecosystem VM VM (2) SRC Host DST Host Add-on Cloud Services
1111 Control System Manager • Downwards time critical operation from • Migration trigger point control system to virtualization plane • Query migration status Configuration • Cancel migration • Migration procedure control Migration CORE • Cluster/Host Configuration Virtualization • Control Policy Storage Network Cloud Service
1111 Migration Test Data VM Stress Total Migration VM Type VM Downtime Type Time idle 4u4g ~1min 70~80 ms idle 16c32g 1~2min 70~90 ms mem_stress 4u4g 1~2min 90~120ms fio 4u4g 1~2min 90~120ms Environment:Generation III instance mem_stress: 512M dirty memory fio: iodepth=32、bs=512、randread Downtime may vary for different vm/hardware/software/stress type
1111 Application of Live Migration : Server Maintenance Can Fault migrate? …… …… VM VM VM VM Cold/Live Online Migration Hypervisor/Host Hypervisor/Host CPU MEM IO CPU MEM IO Repair Offline HOST Maintenance Procedure HOST Fault-Migration
1111 Application of Live Migration : Kerne/Firmware Upgrading Alibaba Maintenance System Ugrading Entrance Impoverments of the Whole Cluster Rolling Migration System Kerne/Firmware Manager Before After Improvement Upgrading NC Uprading Memory Bandwidth VM Live Migration 30179 27873 8.27% (MB/s) SPECjbb 128655 120552 6.72% Packet Forwarding 610 570 7.02% (MB/s)
1111 Application of Live Migration : Cloud Scheduling • Doing 16C …… 32C 16C 32G 32G 32G a) Resource defragments Host b) Resource balance (a)Resource Fragments • To Do a) Power Management 16C 16C 16C 32G 32G 32G b) other Host Host (b)Power & Resource Management
1111 Future Challenges
1111 SR-IOV/PassThrough Live Migration Traditional PassThrough SR-IOV Challenges : VM VM VM • IO Register migration Hypervisor emulate • in-flight IO VF • Guest aware Hardware IO Device IO Device IO Device
1111 Ways to Start a Live Migration Performance • A variety of Instance types General instance Compute enhanced instance • Navigate through Price Credit instance heterogeneous architecture Robustness KVM GPU VIRT 2.0 • Enable more application XEN FPGA PASS-Through practices …… SR-IOV
1111 FAQ
Recommend
More recommend