software evolution & architecture lab Department of Informatics – s.e.a.l. Cloud Benchmarking Estimating Cloud Application Performance Based on Micro Benchmark Profiling Joel Scheuner 2017-06-15 Page 1 Master Thesis Defense
software evolution & architecture lab Department of Informatics – s.e.a.l. Problem Number of Instance Types in 90 t2.nano 80 0.05-1 vCPU 0.5 GB RAM 70 $0.006 hourly 60 50 x1.32xlarge 40 128 vCPUs 1952 GB RAM 30 $16.006 hourly 20 10 0 Aug-06 Aug-07 Aug-08 Aug-09 Aug-10 Aug-11 Aug-12 Aug-13 Aug-14 Aug-15 Aug-16 à Unpractical to Test all Instance Types 2017-06-15 Page 2
software evolution & architecture lab Department of Informatics – s.e.a.l. Motivation Cloud Micro Benchmarks Applications Memory CPU I/O Overall performance Network (e.g., response time) Generic Specific ? Artificial Real-World How relevant? Resource-specific Resource- heterogeneous 2017-06-15 Page 3
software evolution & architecture lab Department of Informatics – s.e.a.l. Research Questions RQ1 – Performance Variability within Instance Types Does the performance of equally configured cloud instances vary relevantly? RQ2 – Application Performance Estimation across Instance Types Can a set of micro benchmarks estimate application performance for cloud instances of different configurations? RQ2.1 – Estimation Accuracy How accurate can a set of micro benchmarks estimate application performance? RQ2.2 – Micro Benchmark Selection Which subset of micro benchmarks estimates application performance most accurately? 2017-06-15 Page 4
software evolution & architecture lab Department of Informatics – s.e.a.l. Methodology Benchmark Benchmark Data Pre- Data Design Execution Processing Analyses � � 50 40 Relative Standard Deviation (RSD) [%] 30 � � � � � � 20 � � � � � � � � � � � � 10 � � � � � � � � � � � � � � � � � � � � 6.83 � � � � � � � � � 5 � � 4.41 � 4.3 � � � � � � � � � � � � � 3.16 3.32 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 0 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � m1.small (eu) m1.small (us) m3.medium (eu) m3.medium (us) m3.large (eu) Configuration [Instance Type (Region)] 2017-06-15 Page 5
software evolution & architecture lab Department of Informatics – s.e.a.l. Performance Data Set * Instance Type vCPU ECU RAM [GiB] Virtualization Network Performance eu + us m1.small 1 1 1.7 PV Low m1.medium 1 2 3.75 PV Moderate eu + us RQ1 m3.medium 1 3 3.75 PV /HVM Moderate m1.large 2 4 7.5 PV Moderate eu m3.large 2 6.5 7.5 HVM Moderate RQ2 m4.large 2 6.5 8.0 HVM Moderate c3.large 2 7 3.75 HVM Moderate c4.large 2 8 3.75 HVM Moderate c3.xlarge 4 14 7.5 HVM Moderate c4.xlarge 4 16 7.5 HVM High c1.xlarge 8 20 7 PV High * ECU := Elastic Compute Unit (i.e., Amazon’s metric for CPU performance) >240 Virtual Machines (VMs) à 3 Iterations à ~750 VM hours >60’000 Measurements 2017-06-15 Page 6
software evolution & architecture lab Department of Informatics – s.e.a.l. RQ1 – Approach RQ1 – Performance Variability within Instance Types Does the performance of equally configured cloud instances vary relevantly? VM 1 VM 3 VM 33 VM 2 Same instance type … iter 1 iter 2 iter 3 … *38 selected metrics 𝐵𝑤(𝑊𝑁 ' ) 𝐵𝑤(𝑊𝑁 ) ) 𝐵𝑤(𝑊𝑁 * ) 𝐵𝑤(𝑊𝑁 ** ) 𝑆𝑓𝑚𝑏𝑢𝑗𝑤𝑓 𝑇𝑢𝑏𝑜𝑒𝑏𝑠𝑒 𝐸𝑓𝑤𝑗𝑏𝑢𝑗𝑝𝑜 (𝑆𝑇𝐸) = 100 ∗ 𝜏 = 𝑛 ? 𝜏 = := absolute standard deviation 𝑛 ? := mean of metric m 2017-06-15 Page 7
software evolution & architecture lab Department of Informatics – s.e.a.l. RQ1 – Results 50 Relative Standard Deviation (RSD) [%] 40 Threads Latency Fileio Random 30 20 10 ⧫ mean 6.83 5 4.41 4.3 Network 3.32 3.16 Fileio Seq. 0 m1.small (eu) m1.small (us) m3.medium (eu) m3.medium (us) m3.large (eu) 2017-06-15 Configuration [Instance Type (Region)] Page 8
software evolution & architecture lab Department of Informatics – s.e.a.l. RQ1 – Implications Hardware heterogeneity exploiting approaches are not worthwhile anymore [OZL+13, OZN+12, FJV+12] Smaller sample size required to confidently assess instance type performance Fair offer [OZL+13] Z. Ou, H. Zhuang, A. Lukyanenko, J. K. Nurminen, P. Hui, V. Mazalov, and A. Ylä- Jääski. Is the same instance type created equal? exploiting heterogeneity of public clouds . IEEE Transactions on Cloud Computing , 1(2):201–214, 2013 [OZN+12] Zhonghong Ou, Hao Zhuang, Jukka K. Nurminen, Antti Ylä-Jääski, and Pan Hui. Exploiting hardware heterogeneity within the same instance type of amazon ec2 . In Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing (HotCloud’12) , 2012 [FJV+12] Benjamin Farley, Ari Juels, Venkatanathan Varadarajan, Thomas Ristenpart, Kevin D. Bowers, and Michael M. Swift. More for your money: Exploiting performance heterogeneity in public clouds . In Proceedings of the 3 rd ACM Symposium on Cloud Computing (SoCC ’12) , pages 20:1–20:14, 2012 2017-06-15 Page 9
software evolution & architecture lab Department of Informatics – s.e.a.l. RQ2 – Approach RQ2 – Application Performance Estimation across Instance Types Can a set of micro benchmarks estimate application performance for cloud instances of different configurations? micro 1 , micro 2 , …, micro N Instance Type 1 (m1.small) app 1 , app 2 Linear Regression Model Instance Type 2 app 1 … micro 1 Instance Type 12 (c1.xlarge) 2017-06-15 Page 10
software evolution & architecture lab Department of Informatics – s.e.a.l. RQ2.1 – Results RQ2.1 – Estimation Accuracy How accurate can a set of micro benchmarks estimate application performance? m1.small m3.medium (pv) Relative Error (RE) = 12.5% WPBench Read − Response Time [ms] m3.medium (hvm) 𝑆 ) = 99.2% m1.medium 2000 m3.large m1.large c3.large m4.large c4.large 1000 c3.xlarge c4.xlarge c1.xlarge Group 0 test 25 50 75 100 train Sysbench − CPU Multi Thread Duration [s] 2017-06-15 Page 11
software evolution & architecture lab Department of Informatics – s.e.a.l. RQ2.2 – Results RQ2.2 – Micro Benchmark Selection Which subset of micro benchmarks estimates application performance most accurately? Estimation Results for WPBench Read – Response Time Relative Error [%] R 2 [%] Benchmark Sysbench – CPU Multi Thread 12.5 99.2 Sysbench – CPU Single Thread 454.0 85.1 Baseline vCPUs 616.0 68.0 ECU 359.0 64.6 2017-06-15 Page 12
software evolution & architecture lab Department of Informatics – s.e.a.l. RQ2 – Implications Suitability of selected micro benchmarks to estimate application performance Benchmarks cannot be used interchangeable à Configuration is important Baseline metrics vCPU and ECU are insufficient Repeat benchmark execution during benchmark design à Check for variations between iterations 2017-06-15 Page 13
software evolution & architecture lab Department of Informatics – s.e.a.l. Related Work Application Performance Application Performance Profiling Prediction • System-level resource monitoring • Trace and reply with Cloud-Prophet [ECA+16, CBMG16] [LZZ+11, LZK+11] • Bayesian cloud configuration refinement for • Compiler-level program similarity [HPE+06] big data analytics [ALC+17] [ECA+16] Athanasia Evangelinou, Michele Ciavotta, Danilo Ardagna, Aliki [LZZ+11] Ang Li, Xuanran Zong, Ming Zhang, Srikanth Kandula, and Xiaowei Yang. Kopaneli, George Kousiouris, and Theodora Varvarigou. Cloud-prophet: predicting web application performance in the cloud . ACM Enterprise applications cloud rightsizing through a joint benchmarking SIGCOMM Poster , 2011 and optimization approach . Future Generation Computer Systems , 2016 [LZK+11] Ang Li, Xuanran Zong, Srikanth Kandula, Xiaowei Yang, and Ming Zhang. Cloud-prophet: Towards application performance prediction in cloud . [CBMG16] Mauro Canuto, Raimon Bosch, Mario Macias, and Jordi Guitart. In Proceedings of the ACM SIGCOMM 2011 Conference (SIGCOMM ’11) , pages A methodology for full-system power modeling in heterogeneous data 426–427, 2011 centers . In Proceedings of the 9th International Conference on Utility and Cloud [ALC+17] Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Computing (UCC ’16) , pages 20–29, 2016 Venkataraman, Minlan Yu, and Ming Zhang. Cherrypick: Adaptively unearthing the best cloud configurations for big data [HPE+06] Kenneth Hoste, Aashish Phansalkar, Lieven Eeckhout, Andy analytics . Georges, Lizy K. John, and Koen De Bosschere. In 14th USENIX Symposium on Networked Systems Design and Implementation Performance prediction based on inherent program similarity . (NSDI 17) , 2017 In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT ’06) , pages 114–122, 2006 2017-06-15 Page 14
Recommend
More recommend