NsimPower: Interconnect Simulator for Power and Performance Prediction � Koji Inoue Kyushu University, Japan � 1 Power/Performance Issues on Interconnection Network (IN) � • Is interconnect power problem? – Roughly 10 to 30 % of total power – Increase in the number of computing nodes – High-bandwidth/Low-latency requirements for strong scaling • Toward to power/energy efficient supercomputing – Need to consider computing node, memory, and interconnection network at the same time! – Bandwidth, latency and energy efficiency optimization from the view point of interconnects 2
Why We Need Interconnection Network Simulators? � • For system designers – Design space exploration for high-performance, power-efficient large scale supercomputers – Detailed analysis for hardware (e.g. buffer size) and software (e.g. all-to-all algorithm) design parameters • For application users – Understand execution behavior of own programs – Can be exploited for program optimizations 3 WHAT IS NSIM? � 4
� � � � � NSIM: Execution Driven Interconnection Network Simulator � EM�W� =����O[���O�� A;9B�� AM�� 7[�kSa�M�U[�� D�[S�M�� ��C �� ��������� ���g� �� �����e�M�W ¡+��44�)���g� FU�a�M�U[��F��a�� �������� ��������� ����PNaR%O[a��%� �������1 %� A[�U�[�U�S�F��a�� ������������e�M�W��%��MS%� ����������2�/�� �3� �����C� ����h������g� E[a����F��O�� ����B�C�B� �������� �����/��E ���ObNaR%O[a��%� �������1 %�� ��������������e�M�W&�%�MS%� G[�[�[Se��G[�a�%�A��T%� E�]a���� DMOW��� ����������2�/�� %�� :M�&G����� ����������������M�a��3� ����h� ��C�B������C� ���� B�CDB� �)3� AD=�Cb��T�MP�F��a�%� h� ���D��C����������� 9�O�� � AD=&�UW���[a�O��O[P���[� ���O�UN��O[�����M������ ���� �� �+� � ��� ¡� ������) � �� ¡� ��� � +� ¡� ��� � �� ¡� � �� ¡� �.�� �%� � �� ¡� � �� ¡� ��� � �� ¡� ++� � %� ¡� �������� �������� �������� �������� %������� � �� ¡� � � �� ������(� �� � 5 NSIM Execution Image � n n GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� D�[O���� D�[O���� D�[O���� D�[O���� D�[O���� D�[O���� D�[O���� D�[O���� GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� GM�S��� B[P�� B[P�� B[P�� B[P�� B[P�� B[P�� B[P�� B[P�� GM�S���Fe����� =����O[���O�� 9MOT�T[�����[O�����U�a�M�����b��M���M�S��� ��[O�����%��[P��%�M�P��[a����� BF=A� BF=A� BF=A� BF=A� BF=A� BF=A� BF=A� BF=A� D�[O��� D�[O��� D�[O��� D�[O��� D�[O��� D�[O��� D�[O��� D�[O��� )� �� +� ,� -‐‑–� .� /� 0� �[���Fe����� AD=�9�bU�[������ 7CE9� 7CE9� 7CE9� 7CE9� 7CE9� 7CE9� 7CE9� 7CE9� )� � +� ,� -‐‑–� .� /� 0� D�[O���[��)� D�[O���[���� 6
Comparison with Other Simulators � ��D����� ¡�� ���D��C�B� ��������3��� �����C����3��� ��C�B������C� ����� ���D��C�B3��� �ea�Ta�H�Ub�� ��b��[���� F�M�R[�P� H=H7� =6A� =F=G %�:aVU��a� FU�a�M�U[�� 9d�Oa�U[����Ub��� G�MO����Ub��� G�MO����Ub��� 9d�Oa�U[����Ub��� A��T[P� �U�O�����9b���� �U�O�����9b���� �U�O�����9b���� ��Ub��� ��Ub��� ��Ub��� DM�M�����FU�a�M�U[��� &&&� �C��U�U��UO�� �C��U�U��UO�� �7[����bM�Ub��� ;�M�a�M�U�e� :�U����b��� DMOW�����b��� DM�M�����AMOTU��� DM�M�����AMOTU��� DM�M�����AMOTU��� 9d���D�M�R[��� F�]a���UM��9d��� ��U���UNa��P�A��[�e�� �FTM��P�A��[�e�� ��U���UNa��P�A��[�e�� FU�a�M�U[��GM�S��� F�M��� �M�S�� FUf�� ���S��i�)�B[P���� ���S��,+�i/-‐‑–��B[P���� K�L�J���M��e�M�P�6��G[c���%�jD�U�OU�����M�P�D�MO�UO���[R�=����O[���O�U[��B��c[�W�%m�A[�SM���MaR�M���DaN�U�T����=�O��+)),�� T���2((ObM���M�R[�P��Pa(N[[W�(��U�(� K+L�B��7T[aPTa�e%�G ��A�T�M%�G �����JU��M��T%�9�����6[T�%�M�P����I���M��%�jFOM�U�S�M��[��U�U��UO��M�M������U�a�M�U[��[R��M�S�&�OM��� U����O[���O�U[�����c[�W�%m�D�[O��[R��T��JU�����FU�a�M�U[��7[�R����O�%�����-‐‑–&0%���O�+)).� K,L�B��E��5PUSM%�A��5��6�a��UOT%����7T��%�D ��7[��a�%�5��;M�M%�A��9��;UM��M�M%�D ����UP��N��S��%�F��FU�ST%�6�����F��U��MOT��& 6a�[c%�G ��GMWW��%�A��G�M[%�M�P�D ��I�M�M�%�j6�a��;���(���[�a��U����O[���O�U[�����c[�W%m�=6A��[a��M��[R��E���M�OT��� 7 ��b��[�����%�I[���-‐‑–1%�B[��+(,%����+/.l+0/%�+)).�� Accuracy � =�k�U6M�P�:M�&G���%�EM�P[��EU�S%�+A6�A���MS��� • Other evaluation – BlueGene/L (IBM) – Kei-Supercomputer (RIKEN/Fujitsu) – FX10 (Fujitsu) 8
Simulation Performance ~The Case for Bruck’s All-to-All~ � 4B (NSIM) 1024B (NSIM) 4B (BigNetSim) 1024B (BigNetSim) 60hour Simulator Execution Time 1hour 1min 1sec 1/60s 1/3600s 2x2x2 4x2x2 4x4x2 4x4x4 8x4x4 8x8x4 8x8x8 16x16x8 16x16x16 32x16x16 32x32x16 32x32x32 64x32x32 16x8x8 Node Size of 3D-Torus (XxYxZ) 9 EXTENSION FOR POWER- PERFORMANCE ANALYSIS � 10
Overview of NsimPower � Boxfish Extended NSIM for visualization (support low power Idle mode) � (LLNL) � A;9B�� �����C� D�[S�M�� ����B�C�B� EM�W�AM�� E�]a���� DMOW��� ��C�B������C� D[c��� =����O[���O�� ���D��C����������� D�[k��� 7[�kSa�M�U[�� • D[c����M�M������ • F����&�U����T���T[�P� PHY’s � Low$Power$Idle$$ power � Wakeup � Sleep � Mode � Ac!ve � Ac!ve � !me � 11 Chunk based Power Modeling � Power of router- j #of links connected in chunk- i to router- j Nlink ∑ { } P P + P ij = ACT BASE k = 1 Ave. active link power Ave. static power of router- j of router- j Chunk � Power[W] � t � 1 � 2 � 12 Chunk-id �
Supporting Low-Power Idle (LPI) Technology � LPI mode ACTIVE mode ACTIVE mode Mode Mode LPI Th. LPI Th. Transition Transition (timeout) (timeout) Power Consumption Active Power P LPI P ACT Static Power P BASE t Traffic No Traffic Latency Penalty 13 Power Model Supporting Low- Power Idel Operations � Power of router- j #of links connected LPI rate of link- k in chunk- i to router- j in chunk- i Nlink ∑ { } P P ACT × (1 − R LPI − k ) + P LPI × R LPI − k + P ij = BASE k = 1 Ave. link active power Ave. link idle power Ave. static power of router- j of router- j of router- j 14
CASE STUDY � 15 Case7Study � Ave.7Base7Power7per7Router � 17.807W7( 1.0x,$0.25x )7 Ave.7Power7on7ACTIVE7mode7 1.027W7 per7link7 Ave.7Power7on7LPI7mode7 0.107W7 power7link � WakeLup7Transi!on7Time � 07ns7L>7 Ideal$case$ Sleep7Transi!on7Time � 07ns7L>7 Ideal$case$ LPI7Threshold � 07μs7L>7 Ideal$case$ Chunk7Length � 50,0007ns7 Topology � 3D7Torus7(8x8x8) � Link7Bandwidth � 5GB/s7 Packet7Size � 2,0487B7 Communica!on � AllLtoLAll7(simple7spread) � 16
Recommend
More recommend