Green-CM: Energy efficient contention management for Transactional Memory Shady Alaa Paolo Romano – INESC-ID/IST Mats Brorsson - KTH
Agenda • Introduction • Related work • Architecture • Green-CM • Evaluation • Conclusion ICPP 2015 - Green-CM 2
Introduction • Multicores are everywhere Main memory – Complex programming • Locks • Deadlocks Core 1 Core 2 Core 3 Core 4 – Transactional memory • Atomics blocks • Transparent from programmer atomic{ if(bal>amount) withdraw(amount); } ICPP 2015 - Green-CM 3
Introduction • Energy efficiency – First order design choice – Battery based devices – Data centers • Goal – Energy efficient transactional memory in terms of both energy and performance ICPP 2015 - Green-CM 4
Introduction • Contention Manager – minimize contention – which transaction to abort – when to restart an aborted transaction • Energy efficiency: – wait implementation – DVFS ICPP 2015 - Green-CM 5
Related work • Few work in literature – Mainly HTM • Clock gating processors upon abort – Lowering frequency upon abort • Using simulator • Studies – HTM consume lower energy • Does not fit all workloads – Need for adaptability • Using DVFS in TM – Fastlane • Designed for low number of threads ICPP 2015 - Green-CM 6
Architecture Throughput* Controller* * Energy* Tuning*of* Tuning*of* Β* α,*Τ* * End** backEoff* Hybrid* Asymmetric* Tx*abort* backEoff* dura.on* Wait** Conten.on (no.*of*retries,* Restart* Implementa.on* Manager* core*on*which* Tx* tx*is*execu.ng)* ICPP 2015 - Green-CM 7
Architecture Throughput* Controller* * Energy* Tuning*of* Tuning*of* Β* α,*Τ* * End** backEoff* Hybrid* Asymmetric* Tx*abort* backEoff* dura.on* Wait** Conten.on (no.*of*retries,* Restart* Implementa.on* Manager* core*on*which* Tx* tx*is*execu.ng)* ICPP 2015 - Green-CM 8
Implementing waits • Building block for contention managers • Drastic effect on energy consumption • Can be implemented in two ways: – Busy waiting – sleeping ICPP 2015 - Green-CM 9
Implementing waits • Busy waiting • Sleeping – Fine granularity – Coarse granularity – Similar to real actual – Low energy work consumption – expensive ICPP 2015 - Green-CM 10
Implementing waits • Hybrid approach – Either busy wait or sleep • Adaptive fashion – How to determine the threshold • Cost of sleep ICPP 2015 - Green-CM 11
Implementing waits No one size fits all Static Thresholds 6 Intruder 5.5 Kmeans 5 EDP / best EDP 4.5 4 3.5 3 2.5 2 1.5 1 100 1000 10000 100000 1x10 6 1x10 7 Threshold � ICPP 2015 - Green-CM 12
Architecture Throughput* Controller* * Energy* Tuning*of* Tuning*of* Β* α,*Τ* * End** backEoff* Hybrid* Asymmetric* Tx*abort* backEoff* dura.on* Wait** Conten.on (no.*of*retries,* Restart* Implementa.on* Manager* core*on*which* Tx* tx*is*execu.ng)* ICPP 2015 - Green-CM 13
Asymmetric CM • DVFS P0 3.0 GHz – Variable operating frequency 2.4 GHz P1 P2 2.2 GHz • Exploiting DVFS – Boosting active threads P3 2.0 GHz – Reducing freq. of backing off P4 1.8 GHz threads P5 1.6 GHz • Enabling DVFS P6 1.4 GHz – Manual control is expensive – How to favor automatic boosting ICPP 2015 - Green-CM 14
Asymmetric CM Linear Linear Busy Busy Boosted Boosted • Linear backoff cores: backoff backoff wait waiting – Shorter backoff periods Exp. Exp. – Mainly busy waiting Sleep Sleep Sleep Sleep Backoff Backoff backoffs • Exp. Backoff cores: Exp. Exp. – Longer backoff periods Sleep Sleep Sleep Sleep Backoff Backoff – Mainly sleep waiting • Favor boosting Exp. Exp. Sleep Sleep Sleep Sleep – When enough cores are Backoff Backoff in sleep states 8 core processor ICPP 2015 - Green-CM 15
Asymmetric CM • Increased contention? – Cores not backing off exponentially • Control number of cores to be boosted ICPP 2015 - Green-CM 16
Asymmetric CM Intruder Genome STM7 Kmeans Memcached Static No. of Boosted Threads 1.8 1.6 EDP / best EDP 1.4 1.2 1 0.8 0.6 0.4 0.2 0 2 4 8 16 No. of Boosted Threads ICPP 2015 - Green-CM 17
Architecture Throughput* Controller* * Energy* Tuning*of* Tuning*of* Β* α,*Τ* * End** backEoff* Hybrid* Asymmetric* Tx*abort* backEoff* dura.on* Wait** Conten.on (no.*of*retries,* Restart* Implementa.on* Manager* core*on*which* Tx* tx*is*execu.ng)* ICPP 2015 - Green-CM 18
Controller • Online, lightweight • Hill climbing • Challenges: – Collection of energy – Multi dimensional • Different exploration strategies – Stabilization – Random jumps ICPP 2015 - Green-CM 19
Controller • Tuning α (threshold for hybrid) 2.5 no stab EDP / best EDP 2 stab stab jmp 1 1.5 stab jmp 10 1 0.5 0 I K M S A n m T v t e r e M m u e r d a a 7 c e n g a r s e c h e d Benchmark ICPP 2015 - Green-CM 20
Controller no stab stab • Tuning β (no. of boosted threads) stab jmp 1 stab jmp 10 1.6 1.4 EDP / best EDP 1.2 1 0.8 0.6 0.4 0.2 0 I K M S A n m T v t e r e M m u e r d a a 7 c e n g a r s e c h e d Benchmark ICPP 2015 - Green-CM 21
Controller • Merging the learners independent stab jmp 1 stab jmp 1 – stab bidim stab jmp 1 stab – stab stab jmp 10 – stab 2.5 Coupling the Tuners EDP / best EDP 2 1.5 1 0.5 0 Intruder Kmeans Memcached STM7 Average Benchmark ICPP 2015 - Green-CM 22
Evaluation Intruder 1.2 EDP-GreenCM / EDP 1 0.8 0.6 0.4 0.2 0 4 8 16 32 48 64 Threads ICPP 2015 - Green-CM 23
Evaluation STM7 1.2 EDP-GreenCM / EDP 1 0.8 0.6 0.4 0.2 0 4 8 16 32 48 64 Threads ICPP 2015 - Green-CM 24
Evaluation Memcached 1.2 EDP-GreenCM / EDP 1 0.8 0.6 0.4 0.2 0 4 8 16 32 48 64 Threads ICPP 2015 - Green-CM 25
Evaluation Intruder, 64 threads p6 p5 % of total cores p4 p3 p2 p1 p0 spin no-asym asym ICPP 2015 - Green-CM 26
Conclusion • Implementation of waits has a significant impact on energy efficiency • Experimental results (obtained on real system) contradict previously published ones based on simulation • Exploiting DVFS enhances energy efficiency • Self-tuning is needed to adapt to different workloads ICPP 2015 - Green-CM 27
THANK YOU ICPP 2015 - Green-CM 28
Evaluation Intruder 1.2 Energy-GreenCM / Energy 1 0.8 0.6 0.4 0.2 0 4 8 16 32 48 64 Threads ICPP 2015 - Green-CM 29
Evaluation Intruder 1.2 1 Time-GreenCM / Time 0.8 0.6 0.4 0.2 0 4 8 16 32 48 64 Threads ICPP 2015 - Green-CM 30
Recommend
More recommend