Outin, Edouard, et al. " Enhancing cloud energy models for optimizing datacenters efficiency. " Cloud and Autonomic Computing (ICCAC), 2015 International Conference on. IEEE, 2015. Reviewed by Cristopher Flagg December 6, 2017
Objective ● Minimize Energy Consumption ● Maintain SLA requirements ● Nontrivial Multi-Objective Optimization Problem ○ Genetic algorithm to optimize Cloud energy consumption ○ Machine learning to improve fitness function
Fitness Function - Research Questions ● Depends on the underlying model ● RQ1. Do differences exist between the energy simulation based on hardware specifications and the real data that can be observed? ● RQ2. Could we use machine learning techniques at runtime to improve the simulation accuracy?
Problem Statement ● Simulation used to model datacenter consumption ● Accuracy of simulation drives accuracy of modeling ● Models used in "Analysis" step of MAPE-K ● Based on Standard Performance Evaluation Corporation (SPEC) benchmarks of power consumption
Problem Statement - CloudSim ● “to provide a generalized and extensible simulation framework that enables modeling, simulation, and experimentation of emerging Cloud computing infrastructures and application services” ● Energy model is based on the host CPU utilization
Problem Statement - GreenCloud ● Packet level simulator with a strong emphasis on networking and energy awareness. ● Independent energy models for each type of resource (e.g. CPU, RAM, disk, network). ● Determining coefficients for models is complex and can not be approximated.
Problem Statement - SimGrid ● Study the behavior of large-scale distributed systems such as Grids, Clouds, HPC or P2P systems ● SURF Energy Plugin enables accounting for computation time and dissipated energy ● Assumes energy consumption is linear with the CPU utilization
Problem Statement - iCanCloud ● Predict the trade-offs between cost and performance of a given set of applications executed in a specific hardware ● Supports modeling hardware energy consumption of a system such as CPUs, memories, disks, PSUs. ● Based on predefined collections of applications
Problem Statement - Summary ● Simulators used in classical analysis step of a MAPE-K ● Analysis step uses hard coded "static" rules, also called Event-Condition-Action (ECA) engines ● This paper uses and manipulates simulators instead of the ECA engine.
Problem Statement - Experimental Protocol ● Google Scholar to identify most cited simulator (CloudSim) ● Simulators based on the spec.org values for the DELL PowerEdge R620 ● Request the PDU metrics for this server through SNMP ● Stress tools to mimic variable server utilization (stress-ng) ● Two experiments on fresh Ubuntu Server 14.04.2 LTS
Problem Statement - Bare Metal ● No hypervisor - Directly stressing host operating system ● Average energy consumption over 120 seconds interval
Problem Statement - Hypervisor and VM ● KVM hypervisor with single large Ubuntu VM ● When idle, non-negligible gap between spec.org and measured value
Problem Statement - RQ1 Revisited ● CloudSim simulation values not very accurate (based on the spec.org data) ● Cannot rely on the CPU metric to predict the Watts consumed.
Approach ● Monitor managed elements of Cloud infrastructure. ● Analysis determines changes needed to bring the system in the ideal state ○ more energy-efficient ○ no SLA violations ○ High performance
Approach
Approach ● Genetic algorithm manipulates a Cloud configuration instanced as a model ● Fitness function designed to evaluate the energy consumption (goal of paper) ● Plan and execute changes from best instance
Approach - Cloud Model ● Model inspired by previous experiments ● Model is mapping of ○ virtual machine placement ○ SLA constraints ○ different hosts load ● Allows mutations, crossovers and validity checks
Approach - Cloud Model ● Uses KMF modeling framework (modeling.kevoree.org) ● Utilizes model generators ● Stores time series of models
Approach - Energy Consumption Model ● OpenStack Ceilometer compute agent on each node ● Forwards all the metrics to central agent for aggregation ● Uses machine learning mechanisms to design a new energy model for the Cloud datacenter ● Train our model beforehand ●
Approach - Energy Consumption Model Detailed sequence of actions performed by every compute node agent: ● On the server, monitor CPU utilization, RAM usage, volume of read and writes on the disk and volume of network data received and sent. ● With the PDU we get the corresponding energy consumed by the server ● Every second we retrieve the metrics from the server and the PDU ● Metrics collector stores tuple (%cpu, %ram, read, writes, recv, sent, Watts)
Approach - Energy Consumption Model
Approach - Energy Consumption Model ● Multivariate Adaptive Regression Spline ● Predict the values of a continuous dependent variable from a set of independent variables ● Does not assume any particular type or class of relationship (e.g., linear, logistic, etc.) between the predictor variables and the dependent variable ● E total = ∑' predict(host) + E network ● Network usage does not change with proportional to traffic load, is related to topology. Model assumes this is a static value
Experimental Protocol - Validation ● Gather sparse data for predictions, representing different utilization levels of the server’s hardware (i.e. CPU, RAM, disk, network) ● Cloud infrastructure mimics random / variable workloads ● Stress-ng used to consume server resources
Experimental Protocol - Sample Data ● Training data gathered for a given host node
Experimental Protocol - Energy model results Ehost is the total energy consumption of a given host • cpu refers to the current host CPU utilization • ram refers to the current host RAM usage • sent denotes the volume of network sent data (in Kb)
Conclusion - Analysis of Results ?
Conclusion - Analysis of Results The results look promising as we get an average error of 3,8% between the effectively measured values and the predicted ones which improve the accuracy comparing to CloudSim. This result permits to answer positively to RQ2
Conclusion - Threats to Validity ● Disk I/O NOT dominant features in the prediction equation computed by the MARS algorithm ○ Volume of disk operations was quite constant ○ Pure sequential disk access is not realistic ● NO live migration energy overhead considered
Conclusion - Questions ● CloudSim is CPU only. Greencloud takes drives and ram into account as well, but not reviewed ● No results, no analysis of missing results
Recommend
More recommend