An Energy-aware Scheduling Algorithm in DVFS-enabled Networked Data - PowerPoint PPT Presentation

An Energy-aware Scheduling Algorithm in DVFS-enabled Networked Data Centers CLOSER 2016 - TEEC Session Mohammad Shojafar , Claudia Canali, Riccardo Lancellotti, and Saeid Abolfazli Department of Engineering Enzo Ferrari, University of Modena and Reggio Emilia, Modena, Italy April 24, 2016 1 of 34

Agenda � Introduction � Problem in data centers � Our contribution � Model � Model Architecture � Computing Model � Frequency Reconfiguration Model � Channel/Communication Model � Optimization problem and solution � Performance Evaluation � Conclusion 2 of 34

Introduction � Cloud Data Centers: Energy-saving computing is critical � Our focus is in the Virtualized Networked Data center (VNetDC) supporting cloud � Qualifying point of our approach, we consider: � Traffic exchange in VNetDCs � Load balancing for incoming request � DVFS (multi-frequency CPUs) hardware technology � QoS: processing time + communication time → challenging constraint 3 of 34

Introduction Our solution addresses: � Minimize the overall energy for the computing-plus-communication resources in VNetDCs � Guaranteeing the time limit of each task and bandwidth limitation of each server jointly by changing the reconfiguration capability Detail: � Dynamic load balancing � Job = chunk of data to process � Online job decompositions and scheduling � Distribute the workload among multiple VMs � Solve nonlinear/nonconvex optimization problem 4 of 34

Model Architecture VNetDC Con fi guration Server 1 Data transmission rate info VM 1 VNIC DVFS CPU frequency Job Server i Network switch Clients & VM i VMM VNIC DVFS Server M VM M VNIC DVFS VLAN 5 of 34

Model Assumptions: 1) Physical servers with DVFS 2) Each server hosts one heterogeneous VM (private cloud scenario) 3) VNetDC comprises M independent congestion-free half-duplex channels 4) A VM on server i is capable to process F ( i ) bits per second 5) No queue is considered for incoming/outgoing workload into/from the system 6) Data centers utilize off-the-shelf rackmount physical servers, which are interconnected by commodity Fast/Giga Ethernet switches 7) Each job has size of L tot 8) Maximum processing (computation and communication) time for each job is T (QoS constraints) 6 of 34

Optimization Problem Goal: minimize the overall resulting communication-plus-computing energy, formally defined as: M M M � � � E tot � E CPU ( i ) + E Reconf ( i ) + E net ( i ) [ Joule ] , (1) i =1 i =1 i =1 � E CPU ( i ): Computation energy for server i � E Reconf ( i ): Reconfiguration energy for server i � E net ( i ): Channel/Communication energy for server i 7 of 34

Computing Model VM ( i ) attributes: { Q , f ( i ) , t ( i ) , f max , T , i = 1 , . . . , M } , (2) i � Q : number of CPU frequencies allowed for each VM (plus an idle state) � f ( i ) = { F j ( i ) | j = 0 , . . . , Q } : discrete frequency set in VM ( i )–using DVFS � f max � F Q ( i ): maximum available frequency in VM ( i ) i � t ( i ) = { t j ( i ) | j = 0 , . . . , Q } : discrete time set in VM ( i ) corresponding to f j ( i ) in VM ( i ) � � Q j =0 t j ( i ) ≤ T : time allowed the VM ( i ) to fully process each submitted task, computation only constraint 8 of 34

Computing Model Fig. 2 illustrates an example for Q = 5. f j (i) f 5 =f Q f 4 f 3 f 2 f 1 f 0 =f idle t 0 (i) t 1 (i) t 2 (i) t 3 (i) t 4 (i) t 5 (i) Q � AC eff f j ( i ) 3 t j ( i ) , [ Joule ] , ∀ i = { 1 , . . . , M } , E CPU ( i ) � (3) j =0 A : active percentage of gates; C eff : effective load capacitance 9 of 34

Frequency Reconfiguration Model Frequency policy : Scale up/down VMs’ processing rates at the minimum cost. We define internal switching cost and external switching cost Internal switching cost : f j ( i ) → f j + k ( i ) ( k steps movement to reach the next active discrete frequency) External switching cost : the cost for external-switching from the final active discrete frequency of VM ( i ) at the end of a job to the first active discrete frequency for the next incoming job of size L tot M M K (∆ f k ( i )) 2 + Ext Cost � � � E Reconf ( i ) � k e (4) i =1 i =1 k =0 k e ( J / ( Hz ) 2 ):an unit-size frequency switching ∆ f k ( i ) � f k +1 ( i ) − f k ( i ) Q − f t − 1 Ext Cost � k e M ( f t ) 2 0 10 of 34

Channel/Communication Model Shannon-Hartley exponential formula � � 2 R ( i ) / W i − 1 P net ( i ) = ζ i + P idle ( i ) , [ Watt ] , (5) � ζ i � N ( i ) 0 W i , i = 1 , . . . , M –noise spectral power density g i � N ( i ) ( W / Hz ) 0 � W i ( Hz ) Transmission bandwidth � R ( i ): Transmission rate over link i � g i : gain of the i -th link Q � i) One-way transmission delay: D ( i ) = F j ( i ) t j ( i )/ R ( i ) j =1 ii) max 1 ≤ i ≤ M { 2 D ( i ) } + T ≤ T . (Minimize the slowest VM) � Q F j ( i ) t j ( i ) � � E net ( i ) � P net ( i ) [ Joule ] . (6) R ( i ) j =1 11 of 34

Optimization problem and solution M M M � � � E CPU ( i ) + E Reconf ( i ) + E net ( i ) min (7.1) i =1 i =1 i =1 M Q � � s.t.: F j ( i ) t j ( i ) = L tot , (7.2) i =1 j =0 M � R ( i ) ≤ R t , (7.3) i =1 Q � t j ( i ) ≤ T , i = 1 , . . . , M , (7.4) j =0 Q 2 F j ( i ) t j ( i ) � ≤ T − T , i = 1 , . . . , M , (7.5) R ( i ) j =0 0 ≤ t j ( i ) ≤ T , 0 ≤ R ( i ) ≤ R t , i = 1 , . . . , M , j = 0 , . . . , Q , (7.6) 12 of 34 (7.7)

Optimization problem and solution (6.1) Eq. (7.1) is the objective function which consists of the sum of three terms which accounts for the computing energy, the reconfiguration energy cost is the networking energy (6.2) Eq. (7.2) is the (global) constraint which guarantees that the overall job is decomposed into M parallel tasks F j ( i ) t j ( i ) is the workload processed for each discrete frequency f j which is processed by VM i during the interval t j ( i ) (6.3) Eq. (7.3) ensures that the bandwidth summation of each VM must be less than the maximum available bandwidth of the global network (6.4) Eq. (7.4) is the constraint on computation time (6.5) Eq. (7.5) guarantees that the duration of each computing interval is no negative and less than T 13 of 34

Optimization problem and solution 1) We can simplify communication part as: M Q M Q � F j ( i ) t j ( i ) � � 2 F j ( i ) t j ( i ) � � � � � 2 P net ( i ) = ( T − T ) P net ( i ) . R ( i ) T − T i =1 j =0 i =1 j =0 (8) 2) The problem feasibility: M Q � � F j ( i ) t j ( i ) ≤ R t ( T − T ) / 2 (9) i =1 j =0 M Q M � � � Tf max F j ( i ) t j ( i ) ≤ . (10) i i =1 j =0 i =1 14 of 34

Performance Evaluation-Simulation setup i) Comparison with � Standard (or Real) available DVFS-enabled technique (Kimura et al., 2006), � Lyapunov (Urgaonkar et al., 2010) � IDEAL no-DVFS (Mathew et al., 2012) and NetDC (Cordeschi et al., 2010) [Theoretical Lower bounds] ii) CVX solver (Grant and Boyd, 2015) + MATLAB iii) Three different scenarios: two synthetic workloads and a real-world workload trace iv) L tot : [ L tot − a , L tot + a ] 15 of 34

Performance Evaluation-Simulation setup Significant parameters and sensevity analysis: 1 � Max slot � M � E tot � i =1 E tot ( i ) i =1 Max slot � Max slot � M � E CPU � 1 i =1 E CPU ( i ) i =1 Max slot 1 � Max slot � M � E Reconf � i =1 E Reconf ( i ) Max slot i =1 net � � Max slot � M � E 1 i =1 E net ( i ) i =1 Max slot � k e , ζ � T , T (QoS parameters) � AET= average execution time 16 of 34

First Scenario L tot ≡ 8 [ Gbit ] a = 2 [ Gbit ] DVFS : Intel Nehalem Quad-core Processor (Kimura et al., 2006) called F 1 = { 0 . 15 , 1 . 867 , 2 . 133 , 2 . 533 , 2 . 668 } Table: Default values of the main system parameters for the first test scenario. Parameter Value Parameter Value PE=M [1 , . . . , 10] 7 [ s ] T 5 [ s ] 100 [ Gbit / s ] T R t 0.05 [ Joule / ( GHz ) 2 ] 1 [ µ F ] C eff k e F 1 [ GHz ] 5 F Q P idle 100% 0.5 [ Watt ] A i f max ζ i 0.5 [ mWatt ] 2.668 [ GHz ] i 17 of 34

Second Scenario L tot ≡ 70 [ Gbit ] a = 10 [ Gbit ] DVFS : Crusoe cluster with TM-5800 CPU in (Almeida et al., 2010), e.g., F 2 = { 0 . 300 , 0 . 533 , 0 . 667 , 0 . 800 , 0 . 933 } Table: Default values of the main system parameters for the second test scenario. Parameter Value 0.005 [ Joule / ( GHz ) 2 ] k e Q 5 F F 2 [ GHz ] 70 [ Mbit ] L tot { 20 , 30 , 40 } M f max 0.933 [ GHz ] i 18 of 34

E tot -vs.- M � ↑ M ∝ E tot ↓ � The average energy-saving of the proposed method is approximately 50% and 60% compared to Lyapunov-based and Standard schedulers, respectively IDEAL Standard NetDC Lyapunov Proposed Method 400 300 E tot [Joule] 200 100 0 1 2 3 4 5 6 7 8 9 10 M 19 of 34

E CPU -vs.- M � ↑ M ∝ E CPU ↓ � The average energy-saving of the proposed method is approximately 25% and 33% compared to Lyapunov-based and Standard schedulers, respectively 100 IDEAL Standard NetDC Lyapunov Proposed Method 80 E CPU [Joule] 60 40 20 0 1 2 3 4 5 6 7 8 9 10 M 20 of 34

E Reconf -vs.- M net � ↑ M ∝ E Reconf ↑ ≪ E CPU or E 2 10 IDEAL Standard NetDC Lyapunov Proposed Method 0 10 E Reconf [Joule] −2 10 −4 10 −6 10 1 2 3 4 5 6 7 8 9 10 M 21 of 34

An Energy-aware Scheduling Algorithm in DVFS-enabled Networked Data - PowerPoint PPT Presentation

An Energy-aware Scheduling Algorithm in DVFS-enabled Networked Data Centers CLOSER 2016 - TEEC Session Mohammad Shojafar , Claudia Canali, Riccardo Lancellotti, and Saeid Abolfazli Department of Engineering Enzo Ferrari, University of Modena and

ClkScrew Aaron Zhang Outline Introduction to DVFS and background information. What makes

DVFS PERFORMANCE PREDICTION FOR MANAGED MULTITHREADED APPLICATIONS Shoaib Akram, Jennifer B.

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

DVFS-Control Techniques for Dense Linear Algebra Operations on Multi-Core Processors Pedro Alonso

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Ponchatoula High School Scheduling for your Junior Year 2015-2016 Scheduling Procedures Online

CPU Scheduling Schedulers in the OS Structure of a CPU Scheduler Scheduling =

Scheduling and SAT Emmanuel Hebrard Toulouse Outline Introduction 1 Scheduling and SAT

rt trs P ( X vi = 1 v , i ) = 1 0 1 1 3 1 +

Asymptotic equivalence of pure quantum state estimation and Gaussian white noise M. Nussbaum

Event Reconstruction Event Reconstruction i in High Energy Physics Experiments in High Energy

Specification and Verification for Grid Component-based Applications E. Madelaine GridComp

Minimum Cost Deployment of Radio and Transport Resources in Centralized Radio Architectures F.

/ 4.0 21.MAY 2019,XIAMEN 2019 5 21

CS 839: Design the Next-Generation Database Lecture 19: RDMA for OLAP Xiangyao Yu 3/31/2020 1

Matlab Review Picker Engineering Program Smith College EGR 301 January 25, 2005 Judith Cardell