Thermal-Effective Clustered Thermal-Effective Clustered Microarchitectures Microarchitectures P. Chaparro Chaparro, J. , J. Gonz Gonzá ález lez and A. and A. Gonz Gonzá ález lez P. Intel Labs - - UPC UPC Intel Labs 1
Motivation Motivation � Removing heat is expensive � Removing heat is expensive � Design point is set for worst case temperatures � Design point is set for worst case temperatures - Expensive thermal solution guarantees peak - Expensive thermal solution guarantees peak performance performance � Usually temperatures are lower � Usually temperatures are lower - A localized hotspot may - A localized hotspot may… … � trigger global emergency mechanisms: But it could be trigger global emergency mechanisms: But it could be � avoided by focusing only on that hotspot avoided by focusing only on that hotspot � not be detected: Sensors covering wider areas not be detected: Sensors covering wider areas � � Clustered architectures give new opportunities for � Clustered architectures give new opportunities for temperature reduction temperature reduction - Peak temperature 33% - Peak temperature 33% - Average temperature 12% - Average temperature 12% 2
Overview Overview � Introduction � Introduction � Processor Architecture � Processor Architecture � Simulation Infrastructure � Simulation Infrastructure � Thermal Analysis of Clustered � Thermal Analysis of Clustered Architectures Architectures � Cluster Hopping � Cluster Hopping � Conclusions � Conclusions 3
Introduction Introduction � Clustering opens new opportunities for � Clustering opens new opportunities for temperature reduction temperature reduction - Distribution of resources - Distribution of resources � Activity distribution Activity distribution � - Hopping schemes - Hopping schemes - Layout flexibility - Layout flexibility � Trade off unit location vs. wire delay Trade off unit location vs. wire delay � - Resource grouping into clusters - Resource grouping into clusters � Voltage and clock domains � Voltage and clock domains � Leakage control � Leakage control � V � V dd dd gating gating 4
Processor Architecture Processor Architecture � Large � Large frontend frontend - 32Kuop trace cache - 32Kuop trace cache - dispatch 8 - dispatch 8 uops uops/cycle /cycle � 2MB L2 cache � 2MB L2 cache � Highly OOO � Highly OOO - 80 - 80- -entry issue queue entry issue queue - 384 - 384- -entry MOB entry MOB - 4 - 4 int int + 3 + 3 fp fp + 4 ld/ + 4 ld/st st Memory Bus - 544+544 physical - 544+544 physical regs regs - 64KB, 2 - 64KB, 2- -way L1 way L1 5
Processor Architecture Processor Architecture FPFU IFU FPFU IFU ROB ROB ROB DL1 DL1 ITLB ITLB ITLB FPRF IRF FPRF IRF RAT RAT RAT DECO DECO DECO TC TC TC MS/MOB MS/MOB BP BP BP FPS IS DTLB FPS IS DTLB UL2 UL2 UL2 6
Processor Architecture Processor Architecture Memory Bus Disambiguation Bus ... Point to Point Link 7
Processor Architecture Processor Architecture DTLB DTLB ROB ROB ROB IFU IFU DL1 DL1 MS/MOB MS/MOB IRF IRF FPFU FPFU ITLB ITLB ITLB FPRF FPRF Cluster 0 Cluster 0 Cluster 0 FPS FPS CS CS Bicluster IS IS RAT RAT RAT Each cluster has half DECO DECO DECO TC TC TC BP BP BP the resources of the Cluster 1 Cluster 1 Cluster 1 original monolithic backend UL2 UL2 UL2 8
Processor Architecture Processor Architecture ROB ROB ROB Cluster 0 Cluster 0 Cluster 0 Cluster 3 Cluster 3 Cluster 3 FPS CS CS IS FPS IS ITLB ITLB ITLB Quadcluster FPRF IRF FPRF IRF RAT RAT RAT DECO DECO DECO Each Cluster 2 Cluster 1 Cluster 2 Cluster 2 Cluster 1 Cluster 1 TC TC TC BP BP BP cluster has a quarter of MS/MOB MS/MOB FPFU IFU FPFU IFU the resources of the original monolithic DL1 DTLB DL1 DTLB backend UL2 UL2 UL2 9
Simulation Infrastructure Simulation Infrastructure � Computes dynamically � Computes dynamically the temperature of the temperature of selected functional selected functional Temperature Temperature Temperature model model blocks (emulates blocks (emulates model Performance Performance Performance thermal sensors) thermal sensors) model model model � Integrated in a � Integrated in a microarchitectural microarchitectural Leakage Leakage Leakage simulator simulator model model Dynamic Dynamic model Dynamic power power power model model model 10
Simulation Infrastructure Simulation Infrastructure Ambient Ambient Ambient Electrical Electrical Thermal Thermal Voltage Voltage Temperature Temperature ( ( V V ) ) ( K ( K ) ) Current Power Current Power ( A A ) ) ( W W ) ) ( ( Resistance Resistance Resistance Resistance Heat sink Heat sink Heat sink ( V / A = ( V / A = Ω Ω ) ) ( K / W ( K / W ) ) Capacity Capacity Capacity Capacity ( J / V = F J / V = F ) ) ( J / K J / K ) ) ( ( Heat Heat Heat Time constant τ τ = R = R · · C C Time constant spreader spreader spreader ( s s ) ) ( Die Die Die R 1-2 R 1-2 T 1 T 2 T 1 T 2 R- R -C pairs C pairs R-C pairs P 2 P 2 C 1-2 C 1-2 11
Thermal Analysis of Thermal Analysis of Clustered Architectures Clustered Architectures � Temperature metrics � Temperature metrics - AbsMax - AbsMax � Maximum sensed temperature Maximum sensed temperature � - Average - Average � Average temperature of the chip area over time Average temperature of the chip area over time � - AverageMax - AverageMax � Average temperature over time of the maximum Average temperature over time of the maximum � sensed temperature sensed temperature 12
Thermal Analysis of Thermal Analysis of Clustered Architectures Clustered Architectures Backends UL2 Frontend Processor 40% 30% 20% 10% Reduction 0% AverageMax AverageMax AbsMax Average degradation AbsMax Average degradation -10% IPC IPC -20% 2 Clusters 4 Clusters -30% -40% Average temperature reduction for 16 SPEC 13
Cluster Hopping Cluster Hopping � Based on activity migration [ � Based on activity migration [Heo Heo, ISLPED , ISLPED 03] 03] - V - V dd gate a subset of clusters dd gate a subset of clusters - Rotate clusters to spread activity along time - Rotate clusters to spread activity along time - Gated clusters cannot provide any register - Gated clusters cannot provide any register value value � Before gating cluster must be emptied Before gating cluster must be emptied � - Cache/DTLB contents are lost - Cache/DTLB contents are lost - Proactive and/or reactive behavior - Proactive and/or reactive behavior � Proactive: Per interval basis Proactive: Per interval basis � � Reactive: On thermal events Reactive: On thermal events � 14
Cluster Hopping Cluster Hopping HOP-3 HOP-2 15
Cluster Hopping Cluster Hopping Backends UL2 Frontend Processor 50% 45% 40% 35% Recuction 30% 25% 20% 15% 10% 5% 0% Average AverageMax degradation Average AverageMax Slowdown AbsMax AbsMax IPC Hop-3 Hop-2 16
Conclusions Conclusions � The analyzed bi � The analyzed bi- -cluster architecture is increasing cluster architecture is increasing temperature: Clustering must be applied smartly temperature: Clustering must be applied smartly � The quad � The quad- -cluster architecture analyzed is effective cluster architecture analyzed is effective reducing temperature: reducing temperature: - Reduces processor peak temperature 33% - Reduces processor peak temperature 33% - Reduces 12% average temperature - Reduces 12% average temperature - IPC penalty of 14% - IPC penalty of 14% - Ignored other benefits of clustering for this study - Ignored other benefits of clustering for this study � Improving the quad � Improving the quad- -cluster architecture with a cluster architecture with a hopping scheme ( HOP HOP- -3 3 ): ): hopping scheme ( - Peak temperature is reduced 37% - Peak temperature is reduced 37% - Average temperature of the processor 14% - Average temperature of the processor 14% - Extra penalty of 3% - Extra penalty of 3% 17
Recommend
More recommend