1 n n 1 microbenchmark logp prediction 1 n n 1 benchmark
play

1:n n:1 Microbenchmark LogP Prediction 1:n n:1 Benchmark Results A - PowerPoint PPT Presentation

Architectural Specialities of InfiniBand TM A new Barrier Algorithm for InfiniBand TM Results and Conclusions Fast Barrier Synchronization for InfiniBand TM Torsten Hoefler Chair of Computer Architecture Technical University of Chemnitz


  1. Architectural Specialities of InfiniBand TM A new Barrier Algorithm for InfiniBand TM Results and Conclusions Fast Barrier Synchronization for InfiniBand TM Torsten Hoefler Chair of Computer Architecture Technical University of Chemnitz IPDPS’06 - CAC’06 Workshop Rhodes Island, Greece 25th April 2006 Torsten Hoefler n-way Dissemination

  2. Architectural Specialities of InfiniBand TM A new Barrier Algorithm for InfiniBand TM Results and Conclusions Architectural Specialities of InfiniBand TM 1 1:n n:1 Microbenchmark LogP Prediction 1:n n:1 Benchmark Results A new Barrier Algorithm for InfiniBand TM 2 The Dissemination Algorithm The n-way Dissemination Algorithm Results and Conclusions 3 Comparison with other MPI_Barrier Implementations Conclusions and Future Work Torsten Hoefler n-way Dissemination

  3. Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results Architectural Specialities of InfiniBand TM 1 1:n n:1 Microbenchmark LogP Prediction 1:n n:1 Benchmark Results A new Barrier Algorithm for InfiniBand TM 2 The Dissemination Algorithm The n-way Dissemination Algorithm Results and Conclusions 3 Comparison with other MPI_Barrier Implementations Conclusions and Future Work Torsten Hoefler n-way Dissemination

  4. Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results 1:n n:1 Microbenchmark Developed to analyze the InfiniBand TM network Especially for collective communication Measures single message performance (RDTSC) MPI based Supports (nearly) all transport types Torsten Hoefler n-way Dissemination

  5. Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results 1:n n:1 Microbenchmark - principle 1 1 ping pong 2 2 0 0 . . . . . . P P (0): Take time 1 (1..n-1): Send a single message to n-1 hosts 2 (1..n-1): Hosts respond immediately 3 (0): Wait for message receiption from all hosts 4 (0): Take time 5 Torsten Hoefler n-way Dissemination

  6. Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results Architectural Specialities of InfiniBand TM 1 1:n n:1 Microbenchmark LogP Prediction 1:n n:1 Benchmark Results A new Barrier Algorithm for InfiniBand TM 2 The Dissemination Algorithm The n-way Dissemination Algorithm Results and Conclusions 3 Comparison with other MPI_Barrier Implementations Conclusions and Future Work Torsten Hoefler n-way Dissemination

  7. Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results The LogP Model LogP model by Culler et.al. 1993 LogP Parameters L - Latency g - Bandwidth-limiting Gap between consecutive messages ( g ≈ 1 / BW ) o - Send-/Receive Overhead P - Number of involved Processors Torsten Hoefler n-way Dissemination

  8. Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results LogP Prediction RTT(P) P max{g,o} 1 P RTT ( P ) / P = ( 4 o + 2 L + ( P − 1 ) · max { g , o } ) / P Torsten Hoefler n-way Dissemination

  9. Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results Architectural Specialities of InfiniBand TM 1 1:n n:1 Microbenchmark LogP Prediction 1:n n:1 Benchmark Results A new Barrier Algorithm for InfiniBand TM 2 The Dissemination Algorithm The n-way Dissemination Algorithm Results and Conclusions 3 Comparison with other MPI_Barrier Implementations Conclusions and Future Work Torsten Hoefler n-way Dissemination

  10. Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results 1:n n:1 Benchmark Results Test Environment 8 Nodes Dual Xeon 2.066 GHz Red Hat Linux release 9 (Shrike) Kernel: 2.4.27 SMP HCA: Mellanox ”Cougar” (MTPB 23108) Torsten Hoefler n-way Dissemination

  11. Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results 1:n n:1 Benchmark Results 16 RDMA Write RDMA Write inline Minimal RTT in Microseconds 14 12 10 8 6 4 0 10 20 30 40 50 60 70 # Processors (P) RDMA/W - fastest transport type in our tests Graph shows minimal values We benefit from sending to multiple hosts simultaneously Atomic was not available on our HCAs Torsten Hoefler n-way Dissemination

  12. Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results 1:n n:1 Benchmark Results 24 RDMA Write RDMA Write inline 22 Average RTT in Microseconds 20 18 16 14 12 10 8 6 5 10 15 20 25 30 35 40 45 50 # Processors (P) Average Graph has many outliers Still same ”shape” Torsten Hoefler n-way Dissemination

  13. Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results A possible Explanation - The LoP Model RTT(P) P t saturated t processing P pipeline processing saturation Pipeline startup function - hardware pipe, caches Minimal processing time - hardware Network saturation - network hardware / transceiver Torsten Hoefler n-way Dissemination

  14. Architectural Specialities of InfiniBand TM The Dissemination Algorithm A new Barrier Algorithm for InfiniBand TM The n-way Dissemination Algorithm Results and Conclusions Architectural Specialities of InfiniBand TM 1 1:n n:1 Microbenchmark LogP Prediction 1:n n:1 Benchmark Results A new Barrier Algorithm for InfiniBand TM 2 The Dissemination Algorithm The n-way Dissemination Algorithm Results and Conclusions 3 Comparison with other MPI_Barrier Implementations Conclusions and Future Work Torsten Hoefler n-way Dissemination

  15. Architectural Specialities of InfiniBand TM The Dissemination Algorithm A new Barrier Algorithm for InfiniBand TM The n-way Dissemination Algorithm Results and Conclusions The Dissemination Algorithm Round 0 Round 1 Round 2 Logarithmic running time ( O ( log 2 P ) ) Works with non-power of two P Torsten Hoefler n-way Dissemination

  16. Architectural Specialities of InfiniBand TM The Dissemination Algorithm A new Barrier Algorithm for InfiniBand TM The n-way Dissemination Algorithm Results and Conclusions Dissemination - Peer selection Round 0 Round 1 Round 2 speer = ( p + 2 r ) mod P rpeer = ( p − 2 r ) mod P Torsten Hoefler n-way Dissemination

  17. Architectural Specialities of InfiniBand TM The Dissemination Algorithm A new Barrier Algorithm for InfiniBand TM The n-way Dissemination Algorithm Results and Conclusions Architectural Specialities of InfiniBand TM 1 1:n n:1 Microbenchmark LogP Prediction 1:n n:1 Benchmark Results A new Barrier Algorithm for InfiniBand TM 2 The Dissemination Algorithm The n-way Dissemination Algorithm Results and Conclusions 3 Comparison with other MPI_Barrier Implementations Conclusions and Future Work Torsten Hoefler n-way Dissemination

  18. Architectural Specialities of InfiniBand TM The Dissemination Algorithm A new Barrier Algorithm for InfiniBand TM The n-way Dissemination Algorithm Results and Conclusions The n-way Dissemination Algorithm Round 0 Round 1 Logarithmic running time ( O ( log 2 P ) − O ( log n P )? ) Works with non-power of n P Torsten Hoefler n-way Dissemination

  19. Architectural Specialities of InfiniBand TM The Dissemination Algorithm A new Barrier Algorithm for InfiniBand TM The n-way Dissemination Algorithm Results and Conclusions n-way Dissemination - Peer selection Round 0 Round 1 speer i = ( p + i · ( n + 1 ) r ) mod P rpeer i = ( p − i · ( n + 1 ) r ) mod P Torsten Hoefler n-way Dissemination

  20. Architectural Specialities of InfiniBand TM Comparison with other MPI_Barrier Implementations A new Barrier Algorithm for InfiniBand TM Conclusions and Future Work Results and Conclusions Architectural Specialities of InfiniBand TM 1 1:n n:1 Microbenchmark LogP Prediction 1:n n:1 Benchmark Results A new Barrier Algorithm for InfiniBand TM 2 The Dissemination Algorithm The n-way Dissemination Algorithm Results and Conclusions 3 Comparison with other MPI_Barrier Implementations Conclusions and Future Work Torsten Hoefler n-way Dissemination

  21. Architectural Specialities of InfiniBand TM Comparison with other MPI_Barrier Implementations A new Barrier Algorithm for InfiniBand TM Conclusions and Future Work Results and Conclusions The n-way Dissemination Algorithm Implementation Details Implementation as collv1 component in Open MPI Communication peers are precomputed Benchmark Details LAM/MPI 7.1.1 TUC SHIBA 1.0 MVAPICH 0.9.4 Torsten Hoefler n-way Dissemination

Recommend


More recommend