energy aware checkpointing of divisible tasks with soft
play

Energy-aware checkpointing of divisible tasks with soft or hard - PowerPoint PPT Presentation

Introduction Framework Single chunk Multiple chunks Simulations Conclusion Energy-aware checkpointing of divisible tasks with soft or hard deadlines Guillaume Aupy 1 , Anne Benoit 1 , 2 , Rami Melhem 3 , Paul Renaud-Goud 1 and Yves Robert 1 ,


  1. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Energy-aware checkpointing of divisible tasks with soft or hard deadlines Guillaume Aupy 1 , Anne Benoit 1 , 2 , Rami Melhem 3 , Paul Renaud-Goud 1 and Yves Robert 1 , 2 , 4 1 . Ecole Normale Sup´ erieure de Lyon, France 2 . Institut Universitaire de France 3 . University of Pittsburgh, USA 4 . University of Tennessee Knoxville, USA Anne.Benoit@ens-lyon.fr http://graal.ens-lyon.fr/~abenoit/ International Green Computing Conference 2013 Arlington, USA Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 1/ 25

  2. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Divisible load scheduling and resilience Divisible load scheduling: divide a computational workload into chunks Arbitrary number of chunks Size of chunks freely chosen by user Goal: minimize makespan, i.e., total execution time Current platforms: increasing frequency of failures Well-established method to deal with failures: checkpointing Take a checkpoint at the end of each chunk and verify result Re-execution in case of transient failure Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 2/ 25

  3. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Divisible load scheduling and resilience Divisible load scheduling: divide a computational workload into chunks Arbitrary number of chunks Size of chunks freely chosen by user Goal: minimize makespan, i.e., total execution time Current platforms: increasing frequency of failures Well-established method to deal with failures: checkpointing Take a checkpoint at the end of each chunk and verify result Re-execution in case of transient failure Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 2/ 25

  4. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Energy: a crucial issue IGCC: Green Computing Conference! Real need to reduce energy dissipation in current processors Processor running at speed s : power s 3 watts Dynamic voltage and frequency scaling techniques (DVFS) Our goal: minimize energy consumption including that of checkpointing and re-execution (if failure) while enforcing a bound on execution time Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 3/ 25

  5. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Energy: a crucial issue IGCC: Green Computing Conference! Real need to reduce energy dissipation in current processors Processor running at speed s : power s 3 watts Dynamic voltage and frequency scaling techniques (DVFS) Our goal: minimize energy consumption including that of checkpointing and re-execution (if failure) while enforcing a bound on execution time Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 3/ 25

  6. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Outline Framework 1 With a single chunk 2 With several chunks 3 Simulation results 4 Conclusion 5 Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 4/ 25

  7. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Framework Execution of a divisible task ( W operations) Failures may occur Transient faults Resilience through checkpointing Objective: minimize expected energy consumption E ( E ), given a deadline bound D Probabilistic nature of failure hits: expectation of energy consumption is natural (average cost over many executions) Deadline bound: two relevant scenarios (soft or hard deadline) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 5/ 25

  8. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Soft vs hard deadline Soft deadline: met in expectation, i.e., E ( T ) ≤ D (average response time) Hard deadline: met in the worst case, i.e., T wc ≤ D VS Soft (expected) Hard (worst-case) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 6/ 25

  9. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Execution time, one single chunk One single chunk of size W Checkpoint overhead: execution time T C Instantaneous failure rate: λ First execution at speed s : T exec = W s + T C Failure probability: P fail = λ T exec = λ ( W s + T C ) In case of failure: re-execute at speed σ : T reexec = W σ + T C And we assume success after re-execution E ( T ) = T exec + P fail T reexec = ( W s + T C ) + λ ( W s + T C )( W σ + T C ) T wc = T exec + T reexec = ( W s + T C ) + ( W σ + T C ) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 7/ 25

  10. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Energy consumption, one single chunk One single chunk of size W Checkpoint overhead: energy consumption E C s × s 3 + E C = Ws 2 + E C First execution at speed s : W Re-execution at speed σ : W σ 2 + E C , with probability P fail P fail = λ T exec = λ ( W � � s + T C ) � W E ( E ) = ( Ws 2 + E C ) + λ W σ 2 + E C � � � s + T C Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 8/ 25

  11. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Multiple chunks Execution times: sum of execution times for each chunk (worst-case or expected) Expected energy consumption: sum of expected energy for each chunk Coherent failure model: consider two chunks W 1 + W 2 = W fail = λ ( W 1 Probability of failure for first chunk: P 1 s + T C ) For second chunk: P 2 fail = λ ( W 2 s + T C ) With a single chunk of size W : P fail = λ ( W s + T C ), differs from P 1 fail + P 2 fail only because of extra checkpoint Trade-off: many small chunks (more T C to pay, but small re-execution cost) vs few larger chunks (fewer T C , but increased re-execution cost) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 9/ 25

  12. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Optimization problem Decisions that should be taken before execution: Chunks: how many ( n )? which sizes ( W i for chunk i )? Speeds of each chunk: first run ( s i )? re-execution ( σ i )? Input: W , T C (checkpointing time), E C (energy spent for checkpointing), λ (instantaneous failure rate), D (deadline) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 10/ 25

  13. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Optimization problem Decisions that should be taken before execution: Chunks: how many ( n )? which sizes ( W i for chunk i )? Speeds of each chunk: first run ( s i )? re-execution ( σ i )? Input: W , T C (checkpointing time), E C (energy spent for checkpointing), λ (instantaneous failure rate), D (deadline) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 10/ 25

  14. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Optimization problem Decisions that should be taken before execution: Chunks: how many ( n )? which sizes ( W i for chunk i )? Speeds of each chunk: first run ( s i )? re-execution ( σ i )? Input: W , T C (checkpointing time), E C (energy spent for checkpointing), λ (instantaneous failure rate), D (deadline) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 10/ 25

  15. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Models Chunks VS Single chunk of size W Multiple chunks ( n and W i ’s) Speed per chunk VS Multiple speeds ( s and σ ) Single speed ( s ) Deadline bound VS Soft ( E ( T ) ≤ D ) Hard ( T wc ≤ D ) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 11/ 25

  16. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Outline Framework 1 With a single chunk 2 With several chunks 3 Simulation results 4 Conclusion 5 Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 12/ 25

  17. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Single chunk and single speed Consider first that s = σ (single speed): need to find optimal speed E ( E ) is a function of s : E ( E )( s ) = ( Ws 2 + E C )(1 + λ ( W s + T C )) Lemma: this function is convex and has a unique minimum s ⋆ (function of λ, W , E c , T c ) 3 √ � � √ 27 a 2 − 4 a − 27 a +2) 1 / 3 2 1 / 3 s ⋆ = − (3 λ W 27 a 2 − 4 a − 27 a +2) 1 / 3 − 1 , 3 √ − 6(1+ λ T C ) 2 1 / 3 √ (3 � 2(1+ λ T C ) � 2 where a = λ E C λ W E ( T ) and T wc : decreasing functions of s Minimum speed s exp and s wc required to match deadline D (function of D , W , T c , and λ for s exp ) → Optimal speed: maximum between s ⋆ and s exp or s wc Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 13/ 25

  18. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Single chunk and multiple speeds Consider now that s � = σ (multiple speeds): two unknowns E ( E ) is a function of s and σ : E ( E )( s , σ ) = ( Ws 2 + E C ) + λ ( W s + T C )( W σ 2 + E C ) Lemma: energy minimized when deadline tight (both for wc and exp) � σ expressed as a function of s : λ W W σ exp = − (1+ λ TC ) , σ wc = ( D − 2 TC ) s − W s D W s + TC → Minimization of single-variable function, can be solved numerically (no expression of optimal s ) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 14/ 25

Recommend


More recommend