An Energy-aware Scheduling Algorithm in DVFS-enabled Networked Data Centers CLOSER 2016 - TEEC Session Mohammad Shojafar , Claudia Canali, Riccardo Lancellotti, and Saeid Abolfazli Department of Engineering Enzo Ferrari, University of Modena and Reggio Emilia, Modena, Italy April 24, 2016 1 of 34
Agenda � Introduction � Problem in data centers � Our contribution � Model � Model Architecture � Computing Model � Frequency Reconfiguration Model � Channel/Communication Model � Optimization problem and solution � Performance Evaluation � Conclusion 2 of 34
Introduction � Cloud Data Centers: Energy-saving computing is critical � Our focus is in the Virtualized Networked Data center (VNetDC) supporting cloud � Qualifying point of our approach, we consider: � Traffic exchange in VNetDCs � Load balancing for incoming request � DVFS (multi-frequency CPUs) hardware technology � QoS: processing time + communication time → challenging constraint 3 of 34
Introduction Our solution addresses: � Minimize the overall energy for the computing-plus-communication resources in VNetDCs � Guaranteeing the time limit of each task and bandwidth limitation of each server jointly by changing the reconfiguration capability Detail: � Dynamic load balancing � Job = chunk of data to process � Online job decompositions and scheduling � Distribute the workload among multiple VMs � Solve nonlinear/nonconvex optimization problem 4 of 34
Model Architecture VNetDC Con fi guration Server 1 Data transmission rate info VM 1 VNIC DVFS CPU frequency Job Server i Network switch Clients & VM i VMM VNIC DVFS Server M VM M VNIC DVFS VLAN 5 of 34
Model Assumptions: 1) Physical servers with DVFS 2) Each server hosts one heterogeneous VM (private cloud scenario) 3) VNetDC comprises M independent congestion-free half-duplex channels 4) A VM on server i is capable to process F ( i ) bits per second 5) No queue is considered for incoming/outgoing workload into/from the system 6) Data centers utilize off-the-shelf rackmount physical servers, which are interconnected by commodity Fast/Giga Ethernet switches 7) Each job has size of L tot 8) Maximum processing (computation and communication) time for each job is T (QoS constraints) 6 of 34
Optimization Problem Goal: minimize the overall resulting communication-plus-computing energy, formally defined as: M M M � � � E tot � E CPU ( i ) + E Reconf ( i ) + E net ( i ) [ Joule ] , (1) i =1 i =1 i =1 � E CPU ( i ): Computation energy for server i � E Reconf ( i ): Reconfiguration energy for server i � E net ( i ): Channel/Communication energy for server i 7 of 34
Computing Model VM ( i ) attributes: { Q , f ( i ) , t ( i ) , f max , T , i = 1 , . . . , M } , (2) i � Q : number of CPU frequencies allowed for each VM (plus an idle state) � f ( i ) = { F j ( i ) | j = 0 , . . . , Q } : discrete frequency set in VM ( i )–using DVFS � f max � F Q ( i ): maximum available frequency in VM ( i ) i � t ( i ) = { t j ( i ) | j = 0 , . . . , Q } : discrete time set in VM ( i ) corresponding to f j ( i ) in VM ( i ) � � Q j =0 t j ( i ) ≤ T : time allowed the VM ( i ) to fully process each submitted task, computation only constraint 8 of 34
Computing Model Fig. 2 illustrates an example for Q = 5. f j (i) f 5 =f Q f 4 f 3 f 2 f 1 f 0 =f idle t 0 (i) t 1 (i) t 2 (i) t 3 (i) t 4 (i) t 5 (i) Q � AC eff f j ( i ) 3 t j ( i ) , [ Joule ] , ∀ i = { 1 , . . . , M } , E CPU ( i ) � (3) j =0 A : active percentage of gates; C eff : effective load capacitance 9 of 34
Frequency Reconfiguration Model Frequency policy : Scale up/down VMs’ processing rates at the mini- mum cost. We define internal switching cost and external switching cost Internal switching cost : f j ( i ) → f j + k ( i ) ( k steps movement to reach the next active discrete frequency) External switching cost : the cost for external-switching from the final active discrete frequency of VM ( i ) at the end of a job to the first active discrete frequency for the next incoming job of size L tot M M K (∆ f k ( i )) 2 + Ext Cost � � � E Reconf ( i ) � k e (4) i =1 i =1 k =0 k e ( J / ( Hz ) 2 ):an unit-size frequency switching ∆ f k ( i ) � f k +1 ( i ) − f k ( i ) Q − f t − 1 Ext Cost � k e M ( f t ) 2 0 10 of 34
Channel/Communication Model Shannon-Hartley exponential formula � � 2 R ( i ) / W i − 1 P net ( i ) = ζ i + P idle ( i ) , [ Watt ] , (5) � ζ i � N ( i ) 0 W i , i = 1 , . . . , M –noise spectral power density g i � N ( i ) ( W / Hz ) 0 � W i ( Hz ) Transmission bandwidth � R ( i ): Transmission rate over link i � g i : gain of the i -th link Q � i) One-way transmission delay: D ( i ) = F j ( i ) t j ( i )/ R ( i ) j =1 ii) max 1 ≤ i ≤ M { 2 D ( i ) } + T ≤ T . (Minimize the slowest VM) � Q F j ( i ) t j ( i ) � � E net ( i ) � P net ( i ) [ Joule ] . (6) R ( i ) j =1 11 of 34
Optimization problem and solution M M M � � � E CPU ( i ) + E Reconf ( i ) + E net ( i ) min (7.1) i =1 i =1 i =1 M Q � � s.t.: F j ( i ) t j ( i ) = L tot , (7.2) i =1 j =0 M � R ( i ) ≤ R t , (7.3) i =1 Q � t j ( i ) ≤ T , i = 1 , . . . , M , (7.4) j =0 Q 2 F j ( i ) t j ( i ) � ≤ T − T , i = 1 , . . . , M , (7.5) R ( i ) j =0 0 ≤ t j ( i ) ≤ T , 0 ≤ R ( i ) ≤ R t , i = 1 , . . . , M , j = 0 , . . . , Q , (7.6) 12 of 34 (7.7)
Optimization problem and solution (6.1) Eq. (7.1) is the objective function which consists of the sum of three terms which accounts for the computing energy, the reconfiguration energy cost is the networking energy (6.2) Eq. (7.2) is the (global) constraint which guarantees that the overall job is decomposed into M parallel tasks F j ( i ) t j ( i ) is the workload processed for each discrete frequency f j which is processed by VM i during the interval t j ( i ) (6.3) Eq. (7.3) ensures that the bandwidth summation of each VM must be less than the maximum available bandwidth of the global network (6.4) Eq. (7.4) is the constraint on computation time (6.5) Eq. (7.5) guarantees that the duration of each computing interval is no negative and less than T 13 of 34
Optimization problem and solution 1) We can simplify communication part as: M Q M Q � F j ( i ) t j ( i ) � � 2 F j ( i ) t j ( i ) � � � � � 2 P net ( i ) = ( T − T ) P net ( i ) . R ( i ) T − T i =1 j =0 i =1 j =0 (8) 2) The problem feasibility: M Q � � F j ( i ) t j ( i ) ≤ R t ( T − T ) / 2 (9) i =1 j =0 M Q M � � � Tf max F j ( i ) t j ( i ) ≤ . (10) i i =1 j =0 i =1 14 of 34
Performance Evaluation-Simulation setup i) Comparison with � Standard (or Real) available DVFS-enabled technique (Kimura et al., 2006), � Lyapunov (Urgaonkar et al., 2010) � IDEAL no-DVFS (Mathew et al., 2012) and NetDC (Cordeschi et al., 2010) [Theoretical Lower bounds] ii) CVX solver (Grant and Boyd, 2015) + MATLAB iii) Three different scenarios: two synthetic workloads and a real-world workload trace iv) L tot : [ L tot − a , L tot + a ] 15 of 34
Performance Evaluation-Simulation setup Significant parameters and sensevity analysis: 1 � Max slot � M � E tot � i =1 E tot ( i ) i =1 Max slot � Max slot � M � E CPU � 1 i =1 E CPU ( i ) i =1 Max slot 1 � Max slot � M � E Reconf � i =1 E Reconf ( i ) Max slot i =1 net � � Max slot � M � E 1 i =1 E net ( i ) i =1 Max slot � k e , ζ � T , T (QoS parameters) � AET= average execution time 16 of 34
First Scenario L tot ≡ 8 [ Gbit ] a = 2 [ Gbit ] DVFS : Intel Nehalem Quad-core Processor (Kimura et al., 2006) called F 1 = { 0 . 15 , 1 . 867 , 2 . 133 , 2 . 533 , 2 . 668 } Table: Default values of the main system parameters for the first test scenario. Parameter Value Parameter Value PE=M [1 , . . . , 10] 7 [ s ] T 5 [ s ] 100 [ Gbit / s ] T R t 0.05 [ Joule / ( GHz ) 2 ] 1 [ µ F ] C eff k e F 1 [ GHz ] 5 F Q P idle 100% 0.5 [ Watt ] A i f max ζ i 0.5 [ mWatt ] 2.668 [ GHz ] i 17 of 34
Second Scenario L tot ≡ 70 [ Gbit ] a = 10 [ Gbit ] DVFS : Crusoe cluster with TM-5800 CPU in (Almeida et al., 2010), e.g., F 2 = { 0 . 300 , 0 . 533 , 0 . 667 , 0 . 800 , 0 . 933 } Table: Default values of the main system parameters for the second test scenario. Parameter Value 0.005 [ Joule / ( GHz ) 2 ] k e Q 5 F F 2 [ GHz ] 70 [ Mbit ] L tot { 20 , 30 , 40 } M f max 0.933 [ GHz ] i 18 of 34
E tot -vs.- M � ↑ M ∝ E tot ↓ � The average energy-saving of the proposed method is approximately 50% and 60% compared to Lyapunov-based and Standard schedulers, respectively IDEAL Standard NetDC Lyapunov Proposed Method 400 300 E tot [Joule] 200 100 0 1 2 3 4 5 6 7 8 9 10 M 19 of 34
E CPU -vs.- M � ↑ M ∝ E CPU ↓ � The average energy-saving of the proposed method is approximately 25% and 33% compared to Lyapunov-based and Standard schedulers, respectively 100 IDEAL Standard NetDC Lyapunov Proposed Method 80 E CPU [Joule] 60 40 20 0 1 2 3 4 5 6 7 8 9 10 M 20 of 34
E Reconf -vs.- M net � ↑ M ∝ E Reconf ↑ ≪ E CPU or E 2 10 IDEAL Standard NetDC Lyapunov Proposed Method 0 10 E Reconf [Joule] −2 10 −4 10 −6 10 1 2 3 4 5 6 7 8 9 10 M 21 of 34
Recommend
More recommend