q uestions
play

Q UESTIONS ? David.Snowdon@nicta.com.au http://ertos.nicta.com.au - PowerPoint PPT Presentation

R UN T IME P REDICTION O F P ERFORMANCE AND E NERGY WHEN F REQUENCY S CALING David Snowdon, Stefan Petters and Gernot Heiser David.Snowdon@nicta.com.au The imagination driving Australias ICT fu R UN T IME P REDICTION O F P ERFORMANCE AND E


  1. R UN T IME P REDICTION O F P ERFORMANCE AND E NERGY WHEN F REQUENCY S CALING David Snowdon, Stefan Petters and Gernot Heiser David.Snowdon@nicta.com.au The imagination driving Australia’s ICT fu

  2. R UN T IME P REDICTION O F P ERFORMANCE AND E NERGY WHEN F REQUENCY S CALING David Snowdon, Stefan Petters and Gernot Heiser David.Snowdon@nicta.com.au ➀ Motivation: problems with DVFS ➁ Modelling performance and energy ➂ Evaluation ➃ Future work The imagination driving Australia’s ICT fu

  3. M OTIVATION ➜ Embedded systems are often restricted by battery life. ➜ Total system energy consumption. Our work looks at effective DVFS in real systems " # ! $ 2 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  4. M OTIVATION Theory: E ∝ V 2 Normalised Total Energy 50 100 150 200 250 300 350 400 450 500 CPU Frequency (MHz) 3 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  5. M OTIVATION Practice: (PXA255 based system) Normalised Total Energy 50 100 150 200 250 300 350 400 450 500 CPU Frequency (MHz) 4 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  6. M OTIVATION Why?: 5 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  7. M OTIVATION Why?: Simple models ➜ P ∝ fV 2 ➜ T ∝ 1 f ➜ V = F ( f ) and F monotonically increasing Modern systems aren’t simple! ➜ Varying number of switches (workload specific!) ➜ Multiple frequency domains ➜ Frequency independent (static) power 5- A R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  8. M OTIVATION Why?: Simple models ➜ P ∝ fV 2 ➜ T ∝ 1 f ➜ V = F ( f ) and F monotonically increasing Modern systems aren’t simple! ➜ Varying number of switches (workload specific!) ➜ Multiple frequency domains ➜ Frequency independent (static) power We want to be able to deal with these nuances 5- B R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  9. E XECUTION TIME MODEL 1 ➜ Simple execution time model: T ∝ f cpu ➜ i.e. Constant cycles ➜ Problem: Ignores execution time independent of CPU-clock 6 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  10. E XECUTION TIME MODEL 1 ➜ Simple execution time model: T ∝ f cpu ➜ i.e. Constant cycles ➜ Problem: Ignores execution time independent of CPU-clock 2 2 bitcnt gzip 1.8 1.8 Normalised Cycles Normalised cycles 1.6 1.6 1.4 1.4 1.2 1.2 1 1 0.8 0.8 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 CPU Frequency (MHz) CPU Frequency (MHz) Implicaton: ➜ Memory-bound performance is less dependent on CPU frequency 6- A R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  11. E XECUTION TIME MODEL ➜ Task: predict the execution time of a workload in an arbitrary system configuration ➜ Low overhead, cross-architectural, dynamic applications 7 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  12. E XECUTION TIME MODEL ➜ Task: predict the execution time of a workload in an arbitrary system configuration ➜ Low overhead, cross-architectural, dynamic applications T = C cpu + C bus + C mem + C io + . . . f cpu f bus f mem f io 7- A R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  13. E XECUTION TIME MODEL ➜ Task: predict the execution time of a workload in an arbitrary system configuration ➜ Low overhead, cross-architectural, dynamic applications T = C cpu + C bus + C mem + C io + . . . f cpu f bus f mem f io C x : characterise a-priori, or online using performance counters = α 1 PMC 1 + α 2 PMC 2 + . . . C bus = β 1 PMC 1 + β 2 PMC 2 + . . . C mem ( C cpu inferred from the other results) 7- B R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  14. E XECUTION TIME MODEL ➜ Task: predict the execution time of a workload in an arbitrary system configuration ➜ Low overhead, cross-architectural, dynamic applications T = C cpu + C bus + C mem + C io + . . . f cpu f bus f mem f io C x : characterise a-priori, or online using performance counters = α 1 PMC 1 + α 2 PMC 2 + . . . C bus = β 1 PMC 1 + β 2 PMC 2 + . . . C mem ( C cpu inferred from the other results) ➜ 2-parameter: avg 1.7%, max 7% ➜ CPU frequency only: avg 10%, max 36% 7- C R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  15. P OWER MODEL ➜ Simple CMOS model: P ∝ fV 2 Problems: ➜ System power ➜ Static power and leakage ➜ Multiple frequency/voltage domains ➜ Temperature dependence ➜ Conversion inefficiencies 8 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  16. P OWER MODEL ➜ Simple CMOS model: P ∝ fV 2 Problems: ➜ System power ➜ Static power and leakage ➜ Multiple frequency/voltage domains ➜ Temperature dependence ➜ Conversion inefficiencies A (slightly more) realistic model: N � C n f n V 2 P = n + P static n =0 8- A R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  17. P OWER MODEL The interaction of run-time and static power: ➜ Dynamic energy increases as frequency increases ➜ Static energy decreases as frequency increases E total = P dyn ∆ t + P static ∆ t Etotal(f) Edyn(f) Estatic(f) Energy CPU Frequency 9 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  18. P OWER MODEL Power/Energy model principles: ➜ Events each use an amount of energy ➜ An event may use energy in more than one voltage domain For our system: E events = V 2 cpu ( α 0 PMC 0 + · · · + α m PMC m )+ β 0 PMC 0 + · · · + β m PMC m 10 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  19. P OWER MODEL Power/Energy model principles: ➜ Events each use an amount of energy ➜ An event may use energy in more than one voltage domain ➜ Clocks cycles count as events For our system: E events = V 2 cpu ( α 0 PMC 0 + · · · + α m PMC m )+ β 0 PMC 0 + · · · + β m PMC m E freqs = V 2 cpu ( γ 1 f cpu + γ 2 f bus + γ 3 f mem )∆ t +( γ 4 f cpu + γ 5 f bus + γ 6 f mem )∆ t 10- A R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  20. P OWER MODEL Power/Energy model principles: ➜ Events each use an amount of energy ➜ An event may use energy in more than one voltage domain ➜ Clocks cycles count as events ➜ Static power models power not related to events or voltages. ➜ Constant IO power for the benchmarks tested. For our system: E events = V 2 cpu ( α 0 PMC 0 + · · · + α m PMC m )+ β 0 PMC 0 + · · · + β m PMC m E freqs = V 2 cpu ( γ 1 f cpu + γ 2 f bus + γ 3 f mem )∆ t +( γ 4 f cpu + γ 5 f bus + γ 6 f mem )∆ t E static = P static ∆ t 10- B R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  21. P OWER MODEL Parameter selection: ➜ Systematically picking the best model for N counters ➜ Least-squares regression finds the coefficients 11 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  22. E VALUATION ➜ Typical embedded platform (PLEB 2, XScale based) ➜ Cycle counter, 2 performance counters, 13 events 12 R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  23. E VALUATION ➜ Typical embedded platform (PLEB 2, XScale based) ➜ Cycle counter, 2 performance counters, 13 events ➜ 37 benchmarks run to completion at each setpoint for all frequency settings ➜ 22 frequency setpoints with different f cpu , f bus and f mem ➜ Voltage varied to three settings for each frequency 12- A R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  24. E VALUATION ➜ Typical embedded platform (PLEB 2, XScale based) ➜ Cycle counter, 2 performance counters, 13 events ➜ 37 benchmarks run to completion at each setpoint for all frequency settings ➜ 22 frequency setpoints with different f cpu , f bus and f mem ➜ Voltage varied to three settings for each frequency ➜ Measurements: Cycles, Frequencies, Performance counters, Energy ➜ Benchmarks were partitioned for calibration and validation 12- B R UN T IME P REDICTION O F P ERFORMANCE AND . . . The imagination driving Australia’s ICT fu

  25. E VALUATION 13 R UN T IME P REDICTION O F P ERFORMANCE AND . . . R2 0.92 0.96 0.98 0.98 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 1 1 1 1 1 1 1 1 (Intercept) v*v*fcpu v*fcpu fcpu v*v*fbus fbus v*v*fmem fmem v*v*PMC0/t v*v*PMC1/t v*v*PMC2/t v*v*PMC3/t v*v*PMC4/t v*v*PMC5/t v*v*PMC6/t v*v*PMC7/t v*v*PMC8/t v*v*PMC9/t v*v*PMC10/t v*v*PMC11/t v*v*PMC12/t v*v*PMC13/t PMC0/t PMC1/t The imagination driving Australia’s ICT fu PMC2/t PMC3/t PMC4/t PMC5/t PMC6/t PMC7/t PMC8/t PMC9/t PMC10/t PMC11/t PMC12/t PMC13/t

Recommend


More recommend