analysis of applications on a high performance low energy
play

Analysis of Applications on a High PerformanceLow Energy Computer - PowerPoint PPT Presentation

7 th Workshop on UnConventional High Performance Computing August 26 2014, Porto Analysis of Applications on a High PerformanceLow Energy Computer Florina M. Ciorba, Thomas Ilsche, Elke Franz, Stefan Pfennig, Christian Scheunert, Ulf


  1. 7 th Workshop on UnConventional High Performance Computing August 26 2014, Porto Analysis of Applications on a High Performance–Low Energy Computer Florina M. Ciorba, Thomas Ilsche, Elke Franz, Stefan Pfennig, Christian Scheunert, Ulf Markwardt, Joseph Schuchart, Daniel Hackenberg, Robert Schöne, Andreas Knüpfer, Wolfgang E. Nagel, Eduard A. Jorswieck, and Matthias S. Müller Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing

  2. Talk Outline 2 Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing ¨ Motivation ¨ Modeling Applications ¨ Modeling a High Performance–Low Energy Computer ¨ Mapping Application to Systems ¨ Modeling Communication ¨ Simulation Results ¨ Summary and Future Work

  3. The Challenge 3 Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing Given a parallel application and a high performance-low energy computer, how can the computer execute the application as fast as possible while consuming the least amount of energy ?

  4. Our Approach 4 Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing ¨ Simulation and analysis workflow software abstraction models architecture abstraction models desired energy ( mapping, runtime ( topology, performance/energy of measurements environment, computation and communication ) energy-aware software) accuracy, sampling rate, measurement HAEC Box parameters scope, etc. ( latency , bandwidth , errors) haec_sim simulation energy/utility instrumented recorded parallel application input function execution desired app. trace simulation (source code) (test systems, simulation goals (existing production systems) system) analysis and simulation evaluation simulation input of input output tracing granularity, performance simulated counters, etc. app. trace mapping and trace and desired tracing visualization and mapping features analysis and analysis (HAEC Box) evaluation of simulation influences visualization feedback Legend: application process trace configuration models display

  5. Our Simulation Framework & State of the Art Our framework State of the art 5 Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing ¨ Trace-driven simulation (TDS) TDS or use traces in some fashion ¨ TDS+Execution-Based Simulation (EBS, replay) ¨ No execution-based simulation ¨ (xSim, BigSim, MPI-NetSim, OMNEST, PSINS, (replay) SILAS, MPI-SIM) ¤ Offers increased accuracy ¤ Offer scalability ¤ Increases modeling complexity for the hybrid ¤ Avoid the need to model complex interconnection networks interconnection networks Sequential TDS (DIMEMAS, HeSSE, LogGOPSim, ¨ TaskSim, Tsim) ¨ Parallel TDS Parallel TDS (xSim, BigSim, OMNEST, PSINS, ¨ SILAS, SIMCaN) Non-hybrid communication network (xSim, ¨ BigSim, DIMEMAS, LogGOPSim, SILAS, TaskSim, ¨ Hybrid (& dynamic) communication Tsim) network Hybrid communication network (HeSSE, OMNEST) ¨ ¨ Trace format contains energy No focus on energy measurements ¨ measurements (performance metrics) Focus on I/O architectures: SIMCaN ¨ ¨ Application AND system performance Application OR system performance modeling ¨ AND energy consumption modeling OR network modeling

  6. Modeling Applications 6 Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing ¨ Performance, scalability, and energy ¨ NPB lu.C.81 on 6 Taurus nodes and node level energy counters (1 Sa/s) ~ 30 s

  7. Modeling Applications 7 Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing Process ¨ lu.C.81 on Taurus graph ¤ Accumulated exclusive time: 69.9% communication, 30.1% computation ¤ Very high number of point-to-point (unicast) messages (11,639,408) Communication matrix

  8. Modeling a High Performance – Low Energy Computer 8 Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing Optical Interconnections HAEC Box • Adaptive analog/digital circuits for E/O transceiver • Embedded polymer waveguides • Packaging technologies (e.g., 3D stacking of Si/III-V hybrids) • Optical switch (MOEMS) for reconfigurable networks • 250 Gbit/s via 10 optical channels /XY direction • 1 us latency • 2D mesh topology Wireless Interconnections • On-chip/on-package antenna fields • 8x8 or 16x16 Butler matrices • Analog/digital beam steering and interference suppression • 200GHz channel / bandwidth / operating range • 100 Gbit/s @ 200GHz / Z direction • 10 us latency • 1D mesh topology (at the moment) Circles – compute nodes Blue lines – optical links Green lines – wireless links

  9. Mapping Applications onto HAEC Box 9 Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing Static mapping of lu.C.81 onto the 3 × 3 × 3 HAEC Box xyz block xyz random Mapping IePLC IaNLC IeNLC AVG IeNLC MIN IeNLC MAX IeNLC xyz 0 11,639,408 228,223 161,658 242,490 11,639,408 4,364,778 7,274,630 173,205 80,829 242,488 block xyz 646,633 10,992,775 99,934 80,829 242,488 random IePLC – inter-process logical communication IaNLC – intra-node logical (local) communication IeNLC – inter-node logical communication

  10. Modeling Communication for Parallel Applications running on the HAEC Box 10 Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing ¤ Message passing blocking ¡ non-­‑blocking ¡ n Point-to-point communica-on ¡ communica-on ¡ ¤ Links applica-on ¡communica-on ¡model ¡(e.g., ¡MPI) ¡ n homogeneous remote ¡ ¤ Topology point-­‑to-­‑point ¡ collec-ve ¡ memory ¡access ¡ n 3D mesh ¤ Path selection HAEC ¡communica-on ¡model ¡ n Single path performance ¡ energy ¡ n XYZ network ¡coding ¡ ¤ Routing n Dimension order routing path ¡ links ¡ topology ¡ ¤ Network coding selec-on ¡ n Practical network coding op-cal ¡communica-on ¡ ¤ Assumptions n Error-free transmission n With acknowledgements

  11. Multicast: Routing vs Network Coding 11 Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing Multicast: S wants to transmit both messages m 1 and m 2 to E and F Topology: butterfly S S S m 2 m 2 m 2 m 1 m 1 m 1 A B A B A B m 1 m 2 m 1 m 2 m 1 m 2 C C C m 1 m 2 m 12 m 1 m 2 m 1 m 2 m 1 m 2 D D D m 1 m 2 m 12 m 12 E F E F E F Network coding (NC): one timeslot for Routing (RT): two timeslots for transmitting transmitting m 1 and m 2 over C-D to E and F m 1 and m 2 over C-D to both E and F à Reduces delay and energy costs, increases throughput

  12. Unicast: Routing vs Network Coding 12 Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing Unicast: S wants to transmit a message (as data packets) to C Topology: linear array Unreliable links: failures or attacks S A B C S A B C p 1 p 1 p 1 + p 2 p 1 + p 2 p 2 2 p 2 + 3 p 2 p 1 + 4 p 2 p 1 + 4 p 2 p 3 p 3 p 3 + 2 p 4 p 3 + 2 p 4 p 2 p 2 . . . . . . . . . . . . Network coding (NC): further linear Routing (RT): data packet lost over A-B has to independent combinations are sufficient be resent

  13. Modeling Communication Delays 13 Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing d h,a l + s a d in d out b node j node j + + + 1 1 1 channel application application network network encoding transmission decoding processor memory memory processor l + s p d mpi d s | d i | d a d out d in d r | d i | d a d mpi b d h,p d s process a data packet d r process a data packet of size s p by the sender of size s p by the receiver d i process a data packet d a process an acknowledgment by an intermediate node of size s a d h,p send a data packet d h,a send an acknowledgment over one hop over one hop d out write out to channel d in read in from channel d mpi write out to/read in from network buffer l latency for channel coding

  14. Modeling Transfer Times 14 Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing ¨ Transfer time tt( x ) for sending x > 0 packets over h ∈ [0,6] hops without errors or acknowledgments Assumption: d h,p ≥ d s and d h,p ≥ d r tt( x ) = 2 ·√ d mpi + d s + ( h + x - 1) ·√ d h,p + ( h - 1) ·√ d i + d r ∀ h > 0 (1) tt( x ) = 2 ·√ d mpi if h = 0 (intra-node communication) ¨ Complete transfer time T( n p ) for sending n p packets over h ∈ [0,6] hops without errors, with acknowledgments (only the final ACK/generation needs to be considered) T( n p ) = tt( s w ) ·√ n w + tt( n r ) + h ·√ ( n w + ⌈ n r / s w ⌉ ) ·√ ( d h,a + d a ) (2)

  15. Modeling Communication for Parallel Applications running on the HAEC Box 15 Collaborative Research Center 912: HAEC − Highly Adaptive Energy-Efficient Computing XYZ path selection for lu.C.81 communication over the physical links of the 3 × 3 × 3 HAEC Box xyz mapping block xyz mapping random mapping Mapping IePLC IePPC IaNPC IeNPC AVG IeNPC MIN IeNPC MAX IeNPC xyz 16,004,186 0 16,004,186 333,420 121,242 484,976 11,639,408 14,549,260 4,364,778 10,184,482 212,176 80,829 484,976 block xyz 31,280,908 646,633 30,364,275 567,301 161,657 1,050,780 random IePLC – inter-process logical communication IaNPC – intra-node physical (local) communication IePPC – inter-process physical communication IeNPC – inter-node physical communication

Recommend


More recommend