Ecole des Mines de Nantes Entropy Journée Thématique Emergente "aspects énergétiques du calcul" Fabien Hermenier, Adrien Lèbre, Jean Marc Menaud menaud@mines-nantes.fr mercredi 9 février 2011
Outline • Motivation • Entropy project • Dynamic consolidation principle • Reconfiguration problem • Some results • Extension to HPC • Cluster Context Switch • Conclusion Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 2 mercredi 9 février 2011
Motivation • DataCenter/Cluster environment • Static allocation of the resources to the jobs • Resources are underused • static allocation of resources vs. dynamic utilization • -> Data center are oversized For a PUE = 2 Run 20 % AC/DC CPU Air C. 45 % 5 50 % 50 % 55 % 100 Idle Memory Fan 80 % s Servers Disk Servers Data center CPU Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 3 mercredi 9 février 2011
Motivation • Dynamic Consolidation • Each task of a job is embedded into a Virtual Machine (VM) • The resources are allocated depending on the needs • VMs are mixed to be hosted on a reduced number of nodes • VM must be always online • Servers unused can be turned off • VMs are remixed when it is necessary, without downtime • But remixed VMs take time ! • Packing the VMs implies several migrations • Some migrations has to be delayed to succeed. • Temporary hosting is necessary • ... -> Performance degradation Reactivity is a key factor Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 4 mercredi 9 février 2011
Outline • Motivation • Entropy project • Dynamic consolidation principle • Reconfiguration problem • Some results • Extension to HPC • Cluster Context Switch • Conclusion Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 5 mercredi 9 février 2011
Dynamic consolidation Entropy observes the current CPU, memory and network requirements of each VM and computes a globally optimized placement of them that satisfy all their requirements while using a minimum number of hosts. Entropy can be cataloged as an IaaS system Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 6 mercredi 9 février 2011
Global Design • A Configuration : • Each VM is assigned on a node, • Each VM requires a fix amount of memory. • Each VM requires a variable amount of CPU. (Simplification : VMs executing a computation are active and require their own CPU.) • -> May be viable Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 7 mercredi 9 février 2011
Using Live Migrations at Cluster Scale • The Virtual Machines Packing Problem (VMPP) • Compute the minimum number of nodes to use to have a viable configuration Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 8 mercredi 9 février 2011
In Action Consumption is reduced by 25% ) 4 servers 4 Tasks, 3 or 4 Servers time % CPU Server n°3 stopped 4 Tasks ( Without Entropy With Entropy Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 9 mercredi 9 février 2011
Outline • Motivation • Entropy project • Dynamic consolidation principle • Reconfiguration problem • Some results • Extension to HPC • Cluster Context Switch • Conclusion Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 10 mercredi 9 février 2011
Order VM Operations (1/2) Correct Status Current Status Non-viable manipulations Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 11 mercredi 9 février 2011
Order VM Operations (2/2) 2 steps Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 12 mercredi 9 février 2011
Migration Interdependences • One additional node is required (critical energy consumption) 3 steps Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 13 mercredi 9 février 2011
Optimizing the reconfiguration process • Determine an efficient reconfiguration plan (thanks to a cost function) • Cost model : • the necessary steps before migrating a VM • the amount of memory to migrate • the parallelism inside a single step Cost = 4 Cost = 9 2 steps 3 steps Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 14 mercredi 9 février 2011
Architecture overview Entropy is a virtual machine (VM) manager for clusters and acts as an infinite control loop, which performs a globally optimized dynamic VM placement without downtime according to cluster resource usage and scheduler objectives Plan and reduce the Compute a viable configuration migration process if using a minimum number of nodes migrations are necessary Extract the current configuration : Migrations orders are sent The position of each VMs and their to the concerned states (active or inactive) hypervisors Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 15 mercredi 9 février 2011
Outline • Motivation • Entropy project • Dynamic consolidation principle • Reconfiguration problem • Some results • Extension to HPC • Cluster Context Switch • Conclusion Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 16 mercredi 9 février 2011
The interest of the dynamic consolidation is limited by the duration of the reconfiguration process. • Entropy computes equivalent configurations with ”cheap” reconfiguration plans until the minimum • What are benefits ? • • • Better reactivity A stable packing A reduced overhead From 14 to 6 minutes Always better Reduced by 9% Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 17 mercredi 9 février 2011
Outline • Motivation • Entropy project • Dynamic consolidation principle • Reconfiguration problem • Some results • Extension to HPC • Cluster Context Switch • Conclusion Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 18 mercredi 9 février 2011
Dynamic consolidation • Servers unused by online applications (web, HA etc.) can be : • Turned off • OR • Can be used by preemptive applications (simulation HPC etc...) • The main problem • How can i improve my cluster by running a maximum of preemptive applications Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 19 mercredi 9 février 2011
Entropy Advanced Processors � 1st � Jobs arrive in the queue job � 2nd � and have to be scheduled. in the � 2nd � job � 4th job � 2nd � 3rd job � queue � Running � Time � Processors � FCFS + Easy backfilling 3rd job � 1st � 2nd � Jobs 2 and 3 have been backfilled. job � job � in the � Some resources are unused (dark areas) 4th job � Running � queue � Time � Processors � Easy backfilling with preemption 3rd job � 1st � 2nd � The 4th job can be started without impacting the 4th � job � job � job � first one. A small piece of resources is still unused. in the � Running � queue � Time � ⇒ consolidation and preemption to finely exploit distributed resources Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 20 mercredi 9 février 2011
General idea: manipulate vJobs instead of jobs • In a similar way of usual processes, each vjob is in a particular state: • A cluster-wide context switch (a set of VM context switches) enables to efficiently rebalance the cluster according to the: scheduler objectives / available resources / waiting vjobs queue (elasticity) [VTDC 2010] Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 21 mercredi 9 février 2011
Reconfiguration plan Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 22 mercredi 9 février 2011
Experiment on a cluster • Benefits • improve resources usage • suspend/resume transparent for the developer • Resources usage Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 23 mercredi 9 février 2011
Outline • Motivation • Entropy project • Dynamic consolidation principle • Reconfiguration problem • Some results • Extension to HPC • Cluster Context Switch • Conclusion Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 24 mercredi 9 février 2011
Conclusion • Manipulate VMs is tedious and may be non cost-effective • Entropy manage VMs instead of process • Provides a efficient and reactive dynamic consolidation policy • and a generic cluster-wide context switch based on mechanisms provided by VMM http ://entropy.gforge.inria.fr LGPL Uses an Abstract VMM (Jasmine-VMM) ESX, Hyper-V, Xen, KVM ... ANR Arpège SelfXL (2008-2011) ANR Arpège MyCloud (2010-2013) FUI Cool-IT (2011-2013) ANR Emergence Entropy (2011-2012) Fabien Hermenier, Adrien Lèbre, J.M. Menaud,- January 2011 - Ascola 25 mercredi 9 février 2011
J.M. Menaud - Juin 2010 26 mercredi 9 février 2011
Recommend
More recommend