IC2E 2017 – Wes J. Lloyd 4/6/2017 Outline Background Research Questions Experimental Workloads Experiments/Evaluation Wes Lloyd, Shrideep Pallickara, Olaf David, Conclusions Mazdak Arabi, Ken Rojas April 6, 2017 Institute of Technology, University of Washington, Tacoma, Washington USA IC2E 2017 : IEEE International Conference on Cloud Engineering April 6, 2017 Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services 2 Rosetta Protein Folding Outline Computational methods for accurate design of Background new hyperstable constrained peptides Research Questions In 53 hours, using 5,904 EC2 compute cores: Experimental Workloads Generated 5.2 million peptide structures Experiments/Evaluation $3,400 spot instances Upfront cost of physical cluster to Conclusions achieve same result in ~53 hours: $857,752 Cloud enables adhoc large-scale experimentation April 6, 2017 3 April 6, 2017 4 Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services VM-type heterogeneity- Amazon EC2 Research Challenges From: Is The Same Instance Type Created Equal 2013 IEEE Transactions on Cloud Computing How can we improve performance and costs for hosting scientific application workloads on the cloud? Resource heterogeneity Resource contention Relative to: HPC Compute clusters April 6, 2017 5 April 6, 2017 6 Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services Mitigating Resource Contention and Heterogeneity in 1 Public Clouds for Scientific Modeling Services
IC2E 2017 – Wes J. Lloyd 4/6/2017 Trial-and-better VM-Scaler Resource Provisioning Z. Ou et al., 2013 IEEE Trans. on Cloud Computing Using Amazon EC2 1. Provision instances 2. Perform trial(s) - - VM testing 3. Keep desired instances 4. Replace undesirable instances Test: Underlying CPU Type future April 6, 2017 Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services 7 April 6, 2017 Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services 8 Trial and Better – VM-Scaler VM-Scaler Harness this approach for VM-Pools Ensure every VM has same backing CPU • Web services application Provide more consistent test results • Rest-based/JSON • Harnesses EC2 API • Manages virtual cloud infrastructure • Supports scientific modeling-as-a-service • Supports Amazon, Eucalyptus clouds future April 6, 2017 9 April 6, 2017 10 Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services Resource Utilization Data Collection CpuSteal Profile resource utilization for Disk scientific workloads running - dsr: disk sector reads CpuSteal : VM’s CPU core is ready to execute - dsreads: disk sector reads completed across many VMs - drm: merged adjacent disk reads but the physical CPU core is busy Sensor on every VM - readtime: time spent reading from disk - dsw: disk sector writes Transmits data to VM-Scaler - dswrites: disk sector writes completed Symptom of over provisioning physical servers - dwm: merged adjacent disk writes CPU - writetime: time spent writing to disk - CPU time Factors which cause CpuSteal : - cpu usr: CPU time in user mode Network - cpu krn:CPU time in kernel mode 1. Processors shared by too many busy VMs - cpu_idle: CPU idle time - nbr: network bytes sent - contextsw: # of context switches - nbs: network bytes received 2. Hypervisor kernel (Xen dom0) is occupying the CPU - cpu_io_wait: CPU time waiting for I/O 3. VM’s CPU time share <100% for 1 or more cores, - cpu_sint_time: CPU time serving soft interrupts - loadavg: (# proc / 60 secs) and 100% is needed for a CPU intensive workload. - cpuSteal: VM CPU ready, physical CPU unavailable April 6, 2017 11 April 6, 2017 12 Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services Mitigating Resource Contention and Heterogeneity in 2 Public Clouds for Scientific Modeling Services
IC2E 2017 – Wes J. Lloyd 4/6/2017 Outline Research Questions RQ1: How common is public cloud VM-type Background implementation heterogeneity? Research Questions Experimental Workloads RQ2: What performance implications result from VM-type heterogeneity for hosting scientific Experiments/Evaluation application workloads? Conclusions April 6, 2017 Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services 13 April 6, 2017 Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services 14 Research Questions - 2 Outline RQ3: How effective is cpuSteal at identifying VMs with high resource contention due to multi-tenancy Background (e.g. noisy neighbor VMs) in a public cloud? Research Questions Experimental Workloads RQ4: What are the performance implications of hosting Experiments/Evaluation scientific modeling workloads on worker VMs with Conclusions consistently high cpuSteal measurements in a public cloud? Is there a pattern to cpuSteal behavior across worker VMs over time? April 6, 2017 15 April 6, 2017 16 Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services Scientific CSIP Model Services Application Workloads Cloud Services Innovation Platform Rusle2 Java-based framework to support development Soil erosion from water of scientific model services (modeling-as-a-service) Median runtime ~1.89s Increase availability and throughput of models Harness scalable cloud infrastructure WEPS Cloud virtualization supports variety of legacy Soil erosion from wind software required for scientific applications Median runtime ~55s (e.g. FORTRAN, Visual C++ 6.0, etc.) Years weather data * Years of crop rotation April 6, 2017 17 April 6, 2017 18 Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services Mitigating Resource Contention and Heterogeneity in 3 Public Clouds for Scientific Modeling Services
IC2E 2017 – Wes J. Lloyd 4/6/2017 Scientific Modeling Workloads - 2 Outline WEPS / RUSLE CPU utilization: Background Research Questions Experimental Workloads Experiments/Evaluation Conclusions April 6, 2017 Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services 19 April 6, 2017 Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services 20 Testing for VM Type Amazon EC2 Heterogeneity VM Type Heterogeneity Identified CPU by checking /proc/cpuinfo Launched 50 VMs of a given type VM type Region Backing CPU Backing CPU Intel E5-2650 v0 Intel Xeon E5645 If there was heterogeneity, launched 50 more m1.medium us-east-1c 8c,95w,96% 6c,80w,4% Intel Xeon X5550 Intel Xeon E5-2665 v0 m2.xlarge us-east-1c 4c, 95w, 48% 8c, 115w, 42% Tested 12 VM types, across 3 generations us-east-1d Intel Xeon E5-2650 v0 Intel Xeon E5-2651 v2 m1.large 8c,95w,74% 12c,105w,19% 1 st : m1.medium, m1.large, m1.xlarge, c1.medium, c1.xlarge Intel Xeon E5645 m1.large us-east-1d -- 2 nd : m2.xlarge, m2.2xlarge, and m2.4xlarge 6c,80w,7% us-east-1d Intel Xeon E5-2665 v0 Intel Xeon X5550 3 rd : c3.large, c3.xlarge c3.2xlarge, m3.large m2.xlarge 8c, 115w,78% 4c, 95w, 22% April 6, 2017 21 April 6, 2017 22 Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services VM Type Heterogeneity VM Type Heterogeneity Performance Implications Performance Variation Tested small 5 VM pools Compared the two most abundant hardware implementations m1.large - Intel Xeon E5-2650 v0, 8cores, 95 w vs. E5-2651 v2, 12 cores, 105 w m2.xlarge - Intel Xeon E5-2665 v0, 8 cores, 115 w vs. X5550, 4 cores, 95 w Workloads WEPS: 10 x 100 runs RUSLE2: 10 x 660 runs April 6, 2017 23 April 6, 2017 24 Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services Mitigating Resource Contention and Heterogeneity in Public Clouds for Scientific Modeling Services Mitigating Resource Contention and Heterogeneity in 4 Public Clouds for Scientific Modeling Services
Recommend
More recommend