characterization analysis of a server consolidation
play

Characterization & Analysis of a Server Consolidation Benchm - PowerPoint PPT Presentation

Characterization & Analysis of a Server Consolidation Benchm ark on Xen Padm a Apparao Ravi Iyer, Don Newell, Xiaomin Zhang, Tom Adelmeyer Intel Corporation Nov 16 th , 2007 1 Background Virtualization and consolidation are a


  1. Characterization & Analysis of a Server Consolidation Benchm ark on Xen Padm a Apparao Ravi Iyer, Don Newell, Xiaomin Zhang, Tom Adelmeyer Intel Corporation Nov 16 th , 2007 1

  2. Background • Virtualization and consolidation are a growing trend in datacenters – > 40% of servers expected to run a consolidated workload by 2010 Workload 1 Workload 3 Workload 2 Workload Guest OS Guest OS Guest OS Single O/S VMM or Hypervisor Server Server • Problem is there is no analysis methodology or performance studies in place for understanding consolidated workloads 2

  3. Motivation • Performance characterization is useful for – Providing feedback to IT administrators – Deployment with fair share of resources to end-users is challenging with virtualization – Providing feedback to Platform Architects – To project future platform performance – How do apps scale for future platforms – To optimize future architectures for consolidation – Architectural Effects on consolidation – Cache and other resource sharing effects – Virtualization overheads effects – Providing feedback to VMM developers – How are the platforms resources (cores/ IO devices scheduled) – Scheduling heuristics may be suboptimal without an execution profile 3

  4. A Consolidation Benchm ark • vConsolidate (vCon) is one of the proposed benchmarks for virtualization consolidation – Developed by Intel – VMM agnostic • VMmark is another consolidation benchmark developed by VMware • vSPEC is a virtualization benchmark being defined by the SPEC committee • Our focus is vCon 4

  5. vConsolidate Benchm ark Configuration • Various profiles for vCon. • We chose profile 3 Workload Profile # 1 Profile # 2 vCPUs vMemory OS App vCPUs vMemory OS App Web Windows Windows Webbench 1 1.0 GB 32-bit IIS 2 1.5 GB 32-bit IIS Mail Windows Windows Loadsim 1 1.0 GB 32-bit 1 1.5 GB 32-bit Exchange Exchange • System config: Database Windows Windows Sysbench 1 1.0 GB 32-bit MS SQL 2 1.5 GB 64-bit MS SQL – Intel dual socket core2-duo Java Windows Windows SPECjbb 1 1.7 GB 32-bit 2 2.0 GB 64-bit BEA JVM BEA JVM machine Windows Windows Idle 1 0.4 GB 32-bit 1 0.4 GB 32-bit – Core2-duo processors at 3GHz/ 4MB second level cache Workload Profile # 3 Profile # 4 vCPUs vMemory OS App vCPUs vMemory OS App Web Linux Windows – 16GB system memory Webbench 2 1.5 GB 32-bit Apache 2 2.0 GB 32-bit IIS Mail Windows Windows – Intel VT technology Loadsim 1 1.5 GB 32-bit 2 2.0 GB 32-bit Exchange Exchange Database Linux Windows Sysbench 2 1.5 GB 64-bit MySQL 4 2.0 GB 64-bit MS SQL • Tools: Java Linux Windows SPECjbb 2 2.0 GB 64-bit 4 2.0 GB 64-bit BEA JVM BEA JVM – Xentop/ sar Windows Windows Idle 1 0.4 GB 32-bit 1 0.4 GB 32-bit – Virtual Emon developed by Intel – Xen code instrumentation 5

  6. Results and Analysis: Perform ance I m pact Dedicated vs. Consolidated Throughput • Workloads are run alone within Performance Normalized to a 1.2 a single VM for the dedicated 1 dedicated run measurements. 0.8 0.6 • SPECjbb loses 37% in 0.4 consolidation 0.2 0 Alone vCon Alone vCon Alone vCon Alone vCon • Sysbench loses 58% JBB Sysbench WebBench Mail • Webbench loses 20% • and Mail loses 32% CPU utilization for dedicated vs. consolidated workloads Cpu% normalized to when running in dedicated mode • Degradation likely to resources 1.20 like 1.00 0.80 core/ cache/ memory/ IO/ network 0.60 0.40 contention and due to 0.20 virtualization overheads 0.00 Alone vCon Alone vCon Alone vCon Alone vCon • Cpu utilization reduction is due JBB Sysbench WebBench Mail to core contention 6

  7. Results and Analysis: Architectural characterization • Understand where the performance loss is coming Contribution to CPI for SPECjbb from Metric Normalized to Single 2.5 – jbb CPI (cycles per 2 instruction) increases 37% 1.5 JBB Alone JBB vCon 1 – Most of CPI increase is due 0.5 to L2 MPI (misses per 0 instruction) increase Score Delta CPI Delta L2 MPI Delta – Due to cache pollution when running with other workloads – Cache interference – Similar behavior observed with other workloads too . 7

  8. Results and Analysis: Cache Scaling • Useful for understanding the benefit of larger caches to the workloads. • Jbb and Sysbench do well with larger caches. • Helpful for platform architects in machine decision about cache sizes for future platforms JBB in vCon 1MB 2MB 4MB Jbb Score 1 1.31 1.78 Jbb CPI 1 0.77 0.57 Jbb L2 MPI 1 0.75 0.49 Performance Comparison of workloads in a Cache Scaling for SPECjbb ( in vCon) Dedicated vs. Consolidated environment alized to Sys in vCon 1MB-S 2MB-S 4MB-S 1.20 Sys Score 1 1.41 1.60 dedicated environment 1.00 Sys CPI 1 0.83 0.76 Sys L2 MPI 1 0.70 0.57 Raw performance norm 0.80 Cache Scaling for Sysbench ( in vCon) 0.60 0.74 0.69 0.64 Web in vCon 1MB-S 2MB-S 4MB-S 0.67 0.64 0.40 0.67 0.76 0.56 Web Score 1 1.08 1.18 0.41 0.43 0.39 0.35 Web CPI 1 0.92 0.88 0.20 Web L2 MPI 1 0.84 0.69 0.00 Cache Scaling for W ebbench ( in vCon) vCon vCon vCon vCon vCon vCon vCon vCon vCon vCon vCon vCon Alone Alone Alone Alone Alone Alone Alone Alone Alone Alone Alone Mail in vCon 1MB-S 2MB-S 4MB-S Mail Score 1 1.15 1.09 1M B 2M B 4M B 1M B 2M B 4M B 1M B 2M B 4M B 1M B 2M B 4M B Mail CPI 1 1.09 0.71 Mail L2 MPI 1 0.67 1.05 JBB SysBench WebBench M ail Cache Scaling for Mail ( in vCon) 8

  9. Results and Analysis: vCon Execution Profile – Life of a VM • Understand how a VM behaves over time • Instrumented the scheduler to give us data like on which pcpu the VM is running , how long, where it did migrate to, when did it come back and who ran while it had migrated – Help understand cache interference – Help understand the behavior of the scheduler • We measured cpu utilization with Computed CPU% Measured from xentop and using our instrumentation ----------- with Scheduler VM Xentop Profile • Data pretty close validating our dom0 30% 36% methodology JBB 122% 120% Sys 116% 118% Web 114% 112% Mail 6% 8% 9

  10. Results and Analysis: vCon Execution Profile – Life of a VM ……. • A VM runs on all pcpus, no Cpu% ---------- Across all particular affinitization VM cpus pCPU0 pCPU1 pCPU2 pCPU3 Dom0 100% 19% 33% 27% 8% – Shows good dynamic load Jbb 100% 32% 28% 20% 25% balancing by the scheduler. Sys 100% 26% 25% 28% 24% Web 100% 18% 20% 27% 40% Mail 100% 37% 23% 18% 2% Time profile of a vcpu across physcial cpus 4 Physical cpus 3 2 1 • A VM comes back to the same 0 pcpu most of the time 0 2000 4000 6000 8000 10000 12000 Thousands – Helps in reducing cache misses, JBB-0 JBB-1 Sys-0 Time Sys-1 Web-0 Web-1 if the cache was not polluted – What is missing is: what VMs ran during the interim that the VM had migrated � extent of Dom0 JBB SYS WEB MAIL % time came back to cache pollution/ quantification of same cpu 95% 87% 92% 92% 97% % Time went to cache misses another cpu 5% 13% 8% 8% 3% 10 10

  11. Results and Analysis: vCon Execution Profile – Cache I nterference • Cache interference impacts the performance of the workload % Time a VM ran with another VM • Find out which Vm/ vcpu shares the 12% 40% second level cache with another 30.3% 28.8% 10% 28.4% 32% % Cpu Time VM/ vcpu and for how much time 8% 24% 6% 16% 4% 9% • Of the 30.3% spent in jbb, 10.5% 8% 2% 1.8% is spent with sysbench, 10.2 % with 0% 0% Dom0 Jbb Sys Web Mail webbench, 7.5% with jbb (other Dom0 Jbb Sys Web Mail vcpu) 2% with Mail • Knowing the L2 MPI and CPI impact of a VM with another VM we can determine the cache interference 11 11

  12. Results and Analysis: vCon Execution Profile – Cache I nterference…….. • Affintize the vcpus to different cores Dom0-0 Dom0-0 JBB-0 JBB-0 Dom0-1 Dom0-1 JBB-1 JBB-1 Core0 Core2 Core0 Core2 • Get cache MPI with each of the Core0 Core0 Core2 Core2 Core1 Core1 Core3 Core3 workloads L2 L2 L2 L2 • Jbb loses 16% running with JBB,14 with Dom0-0 Dom0-0 JBB-0 JBB-0 Dom0-1 Dom0-1 JBB-1 JBB-1 Core0 Core2 Core0 Core2 Sys-0 Sys-0 Sys-1 Sys-1 Sysbench, 11 with Webbench and 3% Core0 Core0 Core2 Core2 Core1 Core1 Core3 Core3 with Mail L2 L2 L2 L2 Impact to SPECjbb L2 MPI due to running with Impact to SPECjbb CPI due to running with other other workloads workloads CPI normalized to when L2MPI normalized to 3.00 1.60 when running alone running alone 1.20 2.00 0.80 1.00 0.40 0.00 0.00 JBB JBB with JBB with JBB with JBB with JBB in JBB JBB with JBB with JBB with JBB with JBB in JBB Sys Web Mail vCon JBB Sys Web Mail vCon 12 12

  13. Form characterization to Modeling • How do we build a performance projection model Virtualized Native Workload Workload Performance VT events cpu% and costs L2 MPI Virtualization Core Overheads interference Cache interference + Projected performance of Workload in consolidation 13 13

Recommend


More recommend