green cm energy efficient contention management for
play

Green-CM: Energy efficient contention management for Transactional - PowerPoint PPT Presentation

Green-CM: Energy efficient contention management for Transactional Memory Shady Alaa Paolo Romano INESC-ID/IST Mats Brorsson - KTH Agenda Introduction Related work Architecture Green-CM Evaluation Conclusion ICPP


  1. Green-CM: Energy efficient contention management for Transactional Memory Shady Alaa Paolo Romano – INESC-ID/IST Mats Brorsson - KTH

  2. Agenda • Introduction • Related work • Architecture • Green-CM • Evaluation • Conclusion ICPP 2015 - Green-CM 2

  3. Introduction • Multicores are everywhere Main memory – Complex programming • Locks • Deadlocks Core 1 Core 2 Core 3 Core 4 – Transactional memory • Atomics blocks • Transparent from programmer atomic{ if(bal>amount) withdraw(amount); } ICPP 2015 - Green-CM 3

  4. Introduction • Energy efficiency – First order design choice – Battery based devices – Data centers • Goal – Energy efficient transactional memory in terms of both energy and performance ICPP 2015 - Green-CM 4

  5. Introduction • Contention Manager – minimize contention – which transaction to abort – when to restart an aborted transaction • Energy efficiency: – wait implementation – DVFS ICPP 2015 - Green-CM 5

  6. Related work • Few work in literature – Mainly HTM • Clock gating processors upon abort – Lowering frequency upon abort • Using simulator • Studies – HTM consume lower energy • Does not fit all workloads – Need for adaptability • Using DVFS in TM – Fastlane • Designed for low number of threads ICPP 2015 - Green-CM 6

  7. Architecture Throughput* Controller* * Energy* Tuning*of* Tuning*of* Β* α,*Τ* * End** backEoff* Hybrid* Asymmetric* Tx*abort* backEoff* dura.on* Wait** Conten.on (no.*of*retries,* Restart* Implementa.on* Manager* core*on*which* Tx* tx*is*execu.ng)* ICPP 2015 - Green-CM 7

  8. Architecture Throughput* Controller* * Energy* Tuning*of* Tuning*of* Β* α,*Τ* * End** backEoff* Hybrid* Asymmetric* Tx*abort* backEoff* dura.on* Wait** Conten.on (no.*of*retries,* Restart* Implementa.on* Manager* core*on*which* Tx* tx*is*execu.ng)* ICPP 2015 - Green-CM 8

  9. Implementing waits • Building block for contention managers • Drastic effect on energy consumption • Can be implemented in two ways: – Busy waiting – sleeping ICPP 2015 - Green-CM 9

  10. Implementing waits • Busy waiting • Sleeping – Fine granularity – Coarse granularity – Similar to real actual – Low energy work consumption – expensive ICPP 2015 - Green-CM 10

  11. Implementing waits • Hybrid approach – Either busy wait or sleep • Adaptive fashion – How to determine the threshold • Cost of sleep ICPP 2015 - Green-CM 11

  12. Implementing waits No one size fits all Static Thresholds 6 Intruder 5.5 Kmeans 5 EDP / best EDP 4.5 4 3.5 3 2.5 2 1.5 1 100 1000 10000 100000 1x10 6 1x10 7 Threshold � ICPP 2015 - Green-CM 12

  13. Architecture Throughput* Controller* * Energy* Tuning*of* Tuning*of* Β* α,*Τ* * End** backEoff* Hybrid* Asymmetric* Tx*abort* backEoff* dura.on* Wait** Conten.on (no.*of*retries,* Restart* Implementa.on* Manager* core*on*which* Tx* tx*is*execu.ng)* ICPP 2015 - Green-CM 13

  14. Asymmetric CM • DVFS P0 3.0 GHz – Variable operating frequency 2.4 GHz P1 P2 2.2 GHz • Exploiting DVFS – Boosting active threads P3 2.0 GHz – Reducing freq. of backing off P4 1.8 GHz threads P5 1.6 GHz • Enabling DVFS P6 1.4 GHz – Manual control is expensive – How to favor automatic boosting ICPP 2015 - Green-CM 14

  15. Asymmetric CM Linear Linear Busy Busy Boosted Boosted • Linear backoff cores: backoff backoff wait waiting – Shorter backoff periods Exp. Exp. – Mainly busy waiting Sleep Sleep Sleep Sleep Backoff Backoff backoffs • Exp. Backoff cores: Exp. Exp. – Longer backoff periods Sleep Sleep Sleep Sleep Backoff Backoff – Mainly sleep waiting • Favor boosting Exp. Exp. Sleep Sleep Sleep Sleep – When enough cores are Backoff Backoff in sleep states 8 core processor ICPP 2015 - Green-CM 15

  16. Asymmetric CM • Increased contention? – Cores not backing off exponentially • Control number of cores to be boosted ICPP 2015 - Green-CM 16

  17. Asymmetric CM Intruder Genome STM7 Kmeans Memcached Static No. of Boosted Threads 1.8 1.6 EDP / best EDP 1.4 1.2 1 0.8 0.6 0.4 0.2 0 2 4 8 16 No. of Boosted Threads ICPP 2015 - Green-CM 17

  18. Architecture Throughput* Controller* * Energy* Tuning*of* Tuning*of* Β* α,*Τ* * End** backEoff* Hybrid* Asymmetric* Tx*abort* backEoff* dura.on* Wait** Conten.on (no.*of*retries,* Restart* Implementa.on* Manager* core*on*which* Tx* tx*is*execu.ng)* ICPP 2015 - Green-CM 18

  19. Controller • Online, lightweight • Hill climbing • Challenges: – Collection of energy – Multi dimensional • Different exploration strategies – Stabilization – Random jumps ICPP 2015 - Green-CM 19

  20. Controller • Tuning α (threshold for hybrid) 2.5 no stab EDP / best EDP 2 stab stab jmp 1 1.5 stab jmp 10 1 0.5 0 I K M S A n m T v t e r e M m u e r d a a 7 c e n g a r s e c h e d Benchmark ICPP 2015 - Green-CM 20

  21. Controller no stab stab • Tuning β (no. of boosted threads) stab jmp 1 stab jmp 10 1.6 1.4 EDP / best EDP 1.2 1 0.8 0.6 0.4 0.2 0 I K M S A n m T v t e r e M m u e r d a a 7 c e n g a r s e c h e d Benchmark ICPP 2015 - Green-CM 21

  22. Controller • Merging the learners independent stab jmp 1 stab jmp 1 – stab bidim stab jmp 1 stab – stab stab jmp 10 – stab 2.5 Coupling the Tuners EDP / best EDP 2 1.5 1 0.5 0 Intruder Kmeans Memcached STM7 Average Benchmark ICPP 2015 - Green-CM 22

  23. Evaluation Intruder 1.2 EDP-GreenCM / EDP 1 0.8 0.6 0.4 0.2 0 4 8 16 32 48 64 Threads ICPP 2015 - Green-CM 23

  24. Evaluation STM7 1.2 EDP-GreenCM / EDP 1 0.8 0.6 0.4 0.2 0 4 8 16 32 48 64 Threads ICPP 2015 - Green-CM 24

  25. Evaluation Memcached 1.2 EDP-GreenCM / EDP 1 0.8 0.6 0.4 0.2 0 4 8 16 32 48 64 Threads ICPP 2015 - Green-CM 25

  26. Evaluation Intruder, 64 threads p6 p5 % of total cores p4 p3 p2 p1 p0 spin no-asym asym ICPP 2015 - Green-CM 26

  27. Conclusion • Implementation of waits has a significant impact on energy efficiency • Experimental results (obtained on real system) contradict previously published ones based on simulation • Exploiting DVFS enhances energy efficiency • Self-tuning is needed to adapt to different workloads ICPP 2015 - Green-CM 27

  28. THANK YOU ICPP 2015 - Green-CM 28

  29. Evaluation Intruder 1.2 Energy-GreenCM / Energy 1 0.8 0.6 0.4 0.2 0 4 8 16 32 48 64 Threads ICPP 2015 - Green-CM 29

  30. Evaluation Intruder 1.2 1 Time-GreenCM / Time 0.8 0.6 0.4 0.2 0 4 8 16 32 48 64 Threads ICPP 2015 - Green-CM 30

Recommend


More recommend