System-Level Design Optimization for Integration with Silicon Photonics Ayse K. Coskun Boston University, ECE Department In collaboration with: Ajay Joshi 1 , Andrew B. Kahng 2,3 , Jonathan Klamkin 4 , Tiansheng Zhang 1 , John Recchio 2 , Vaishnav Srinivas 2 , Anjun Gu 3 , Yenai Ma 1 1 Boston University ECE Dept.; UCSD 2 ECE and 3 CSE Dept.; 4 UCSB ECE Dept. This research has been partially funded by the NSF grants CNS-1149703 and CCF-1149549. Work at UCSD has been supported by NSF, Samsung and the IMPACT+ Center.
Towards Many-core Computing Systems • Due to technology scaling & high computation needs, more resources are integrated on-chip Intel SCC (48 cores, 2010) Tilera Tile Gx (72 cores, 2012) Intel Xeon Phi (72 cores, 2015) [Rupp, 40 years of microprocessor trend data, 2015] 2
3D Stacking Technology & Its Benefits • 3D stacking: integration of a larger amount of resources with better yield, lower latency, more heterogeneity Area/Chip Yield On-chip Comm. latency # of Cores 2D 3D Photonic • Heterogeneous technologies Layer integrated on a single chip Memory • On-chip stacking DRAM Layer • Silicon-photonic Network-on- Processor Chip (PNoC) Layer http://researcher.watson.ibm.com/researcher/view_group.php?id=2757 3
Challenges of 3D Stacking Technology • On-chip Resource Management Under utilized resources Performance and energy efficiency Layer0 benefits left on the table Layer1 Increased power density Layer2 Core Potential thermal violations Cache LayerN • On-chip Thermal Management Photonic Thermal and process sensitivity of Layer devices in other technologies Memory Resilience problems or high Layer power consumption Processor Layer http://researcher.watson.ibm.com/researcher/view_group.php?id=2757 4
Silicon-Photonics Network-on-Chip • Silicon-Photonic Link Ring Driver Amplifier mod. Photodetector Coupler λ 1 Ring Ring 1 1 λ λ Ring Modulator Filter filter LASER Waveguide Integration methods: Mono. 2.5D 3D • Silicon-Photonic Links vs. Electrical Links o Higher bandwidth o More sensitive to density thermal variations o More sensitive to o Lower long-distance process variations communication latency o Lower data-dependent energy consumption 5
Silicon-Photonics Network-on-Chip • Silicon-Photonic Link Ring Driver Amplifier mod. Photodetector Coupler λ 1 Ring Ring 1 1 λ λ Ring Modulator Filter filter LASER Waveguide Integration micro-heater methods: Mono. 2.5D 3D • Silicon-Photonic Links vs. Electrical Links o Higher bandwidth o More sensitive to density thermal variations High thermal o More sensitive to tuning power o Lower long-distance process variations communication latency o High optical loss o Lower data-dependent o Low laser source High laser energy consumption efficiency (due to source power high temp.) On-chip energy efficiency is a limiting factor for PNoC integration! 6
System-Level Simulation Framework 11
Design Space Exploration Ring Design # of cores & NoC topology Thermal Dimensions n refraction sensitivity FSR Apps Optical NoC Free Spectral Range area limit (FSR) BW requirement (5%~10%) ring λ mod. BW per Spacing between wavelength wavelengths ring # of filters # of wavelengths waveguides Tolerable Ring # of wavelengths Temperature Gradient per waveguide 12
Target Many-core System w/ PNoC [DATE’14, TCAD’16] • 256-core system with Clos network Core Architecture: IA-32 core in Intel SCC [ Howard,ISSCC’11 ] , 16KB I/D L1 cache & 256KB L2 cache; Memory 2 MCs Controllers routers 8 Core tiles 16 wgs with 16 rings/wg Processor Tile with 4 Cores L2 L2 C+L1 C+L1 Input Middle Output L2 L2 stage stage stage C+L1 C+L1 16 wgs 14
[DATE’16] Floorplan Optimization Flow INPUT OUTPUT MILP-Based Optimizer Design Options & Constraints Floorplan with Minimized (# of cores, aspect PNoC Power & Area Cost Compact ratios, etc.) Thermal Model • Optimization Goal: – PNoC Power: • P & R’s impact on waveguide length, crossing and bending • Laser source efficiency • PNoC placement’s impact on thermal tuning power – PNoC Area : • Area cost of router groups and waveguides 15
[DATE’16] Floorplan Optimization Flow INPUT OUTPUT MILP-Based Optimizer Design Options & Constraints Floorplan with Minimized (# of cores, aspect PNoC Power & Area Cost Compact ratios, etc.) Thermal Model • Compact thermal model 16
[DATE’16] Floorplan Optimization Flow INPUT OUTPUT MILP-Based Optimizer Design Options & Constraints Floorplan with Minimized (# of cores, aspect PNoC Power & Area Cost Compact ratios, etc.) Thermal Model • Compact thermal model Accumulated thermal Power profile: weight profiles Resonant Thermal frequency Compact tuning difference Thermal Model power among router groups Size: 1 × N N × M 1 × M 17
[DATE’16] Cross-layer PNoC P&R Optimization Power Profiles Thermal Conditions of Potential Ring Group Locations PNoC Layouts w/ Minimum PNoC Power 18
[DATE’14] RingAware Workload Allocation Policy Rings RD0 cores RD1 cores • Goals: Threads RD2 cores – Minimize the difference among ring temperatures – Reduce the overall chip temperature • Active cores’ impact on Ring Temp. Gradient: 7.5°C ring temperature – Classify the cores based on their distances to a ring group <1°C <1°C 19
[DATE’14] RingAware Workload Allocation Policy • Ring temperature gradient minimization – RingAware – Take ring locations into consideration Categorize cores based on their relative positions to the rings # of threads <= the # of non- RD0 and non-center cores? Yes No Keep same # of Avoid RD0 and threads in each center cores Center core RD0 cores RD0 region • Multi-program support – Sort the threads based on their power dissipation & allocate high- power application first 20
FreqAlign Workload Allocation Policy [TCAD’16] • Process variation introduces resonant frequency shift after the system is manufactured FreqAlign + • Only balancing the temperature of ring groups is not Adaptive enough to compensate the frequency mismatch Frequency • On- chip laser sources’ optical frequencies also need to Tuning match with corresponding rings’ resonant frequency ④ ③ ② ① Laser source ① ④ ③ ② Ring Group 1 ④ ③ ② ① Ring Group 2 ④ ③ ② ① Ring Group 3 21
FreqAlign Workload Allocation Policy [TCAD’16] • • Target many-core system: FreqAlign: o Keep track of the optical frequency shifts o Keep track of the optical frequency shifts of ring groups (in RG weight array ) of ring groups (in RG weight array ) o Record every core’s thermal impact on every ring group o Choose the core to minimize the frequency difference among all ring groups • Workflow: Find a core that minimizes No More Initial RG End the frequency diff. and threads? weight array assign the thread Yes Update the RG weight array 22
Experimental Methodology • Simulation Framework: • Workload Sets: Selected benchmarks from SPLASH2, PARSEC and UHPC: Workload Sets Job 1 Job2 HP + HP md shock • How about emerging HP + MP md blackscholes applications? HP + LP shock lu_cont MP + MP barnes blackscholes MP + LP barnes water_nsq LP + LP lu_cont canneal 23
Experimental Results for Many-core System w/o Process Variations Resonance Frequency Difference PNoC Thermal Tuning Power • Compared to RingAware , FreqAlign reduces the resonant frequency difference by 60.6% on average; • Compared to RingAware + TFT , FreqAlign + AFT reduces the tuning power by 14.93W on average. 24
Summary & Questions o Cross-layer, thermally-aware optimizer for floorplanning of PNoCs o Runtime workload allocation for thermal tuning power reduction o Cross-layer simulation flow: an enabler to optimization of systems with heterogeneous technologies 29
Recommend
More recommend