Architectural Specialities of InfiniBand TM A new Barrier Algorithm for InfiniBand TM Results and Conclusions Fast Barrier Synchronization for InfiniBand TM Torsten Hoefler Chair of Computer Architecture Technical University of Chemnitz IPDPS’06 - CAC’06 Workshop Rhodes Island, Greece 25th April 2006 Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM A new Barrier Algorithm for InfiniBand TM Results and Conclusions Architectural Specialities of InfiniBand TM 1 1:n n:1 Microbenchmark LogP Prediction 1:n n:1 Benchmark Results A new Barrier Algorithm for InfiniBand TM 2 The Dissemination Algorithm The n-way Dissemination Algorithm Results and Conclusions 3 Comparison with other MPI_Barrier Implementations Conclusions and Future Work Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results Architectural Specialities of InfiniBand TM 1 1:n n:1 Microbenchmark LogP Prediction 1:n n:1 Benchmark Results A new Barrier Algorithm for InfiniBand TM 2 The Dissemination Algorithm The n-way Dissemination Algorithm Results and Conclusions 3 Comparison with other MPI_Barrier Implementations Conclusions and Future Work Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results 1:n n:1 Microbenchmark Developed to analyze the InfiniBand TM network Especially for collective communication Measures single message performance (RDTSC) MPI based Supports (nearly) all transport types Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results 1:n n:1 Microbenchmark - principle 1 1 ping pong 2 2 0 0 . . . . . . P P (0): Take time 1 (1..n-1): Send a single message to n-1 hosts 2 (1..n-1): Hosts respond immediately 3 (0): Wait for message receiption from all hosts 4 (0): Take time 5 Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results Architectural Specialities of InfiniBand TM 1 1:n n:1 Microbenchmark LogP Prediction 1:n n:1 Benchmark Results A new Barrier Algorithm for InfiniBand TM 2 The Dissemination Algorithm The n-way Dissemination Algorithm Results and Conclusions 3 Comparison with other MPI_Barrier Implementations Conclusions and Future Work Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results The LogP Model LogP model by Culler et.al. 1993 LogP Parameters L - Latency g - Bandwidth-limiting Gap between consecutive messages ( g ≈ 1 / BW ) o - Send-/Receive Overhead P - Number of involved Processors Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results LogP Prediction RTT(P) P max{g,o} 1 P RTT ( P ) / P = ( 4 o + 2 L + ( P − 1 ) · max { g , o } ) / P Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results Architectural Specialities of InfiniBand TM 1 1:n n:1 Microbenchmark LogP Prediction 1:n n:1 Benchmark Results A new Barrier Algorithm for InfiniBand TM 2 The Dissemination Algorithm The n-way Dissemination Algorithm Results and Conclusions 3 Comparison with other MPI_Barrier Implementations Conclusions and Future Work Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results 1:n n:1 Benchmark Results Test Environment 8 Nodes Dual Xeon 2.066 GHz Red Hat Linux release 9 (Shrike) Kernel: 2.4.27 SMP HCA: Mellanox ”Cougar” (MTPB 23108) Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results 1:n n:1 Benchmark Results 16 RDMA Write RDMA Write inline Minimal RTT in Microseconds 14 12 10 8 6 4 0 10 20 30 40 50 60 70 # Processors (P) RDMA/W - fastest transport type in our tests Graph shows minimal values We benefit from sending to multiple hosts simultaneously Atomic was not available on our HCAs Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results 1:n n:1 Benchmark Results 24 RDMA Write RDMA Write inline 22 Average RTT in Microseconds 20 18 16 14 12 10 8 6 5 10 15 20 25 30 35 40 45 50 # Processors (P) Average Graph has many outliers Still same ”shape” Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM 1:n n:1 Microbenchmark A new Barrier Algorithm for InfiniBand TM LogP Prediction Results and Conclusions 1:n n:1 Benchmark Results A possible Explanation - The LoP Model RTT(P) P t saturated t processing P pipeline processing saturation Pipeline startup function - hardware pipe, caches Minimal processing time - hardware Network saturation - network hardware / transceiver Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM The Dissemination Algorithm A new Barrier Algorithm for InfiniBand TM The n-way Dissemination Algorithm Results and Conclusions Architectural Specialities of InfiniBand TM 1 1:n n:1 Microbenchmark LogP Prediction 1:n n:1 Benchmark Results A new Barrier Algorithm for InfiniBand TM 2 The Dissemination Algorithm The n-way Dissemination Algorithm Results and Conclusions 3 Comparison with other MPI_Barrier Implementations Conclusions and Future Work Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM The Dissemination Algorithm A new Barrier Algorithm for InfiniBand TM The n-way Dissemination Algorithm Results and Conclusions The Dissemination Algorithm Round 0 Round 1 Round 2 Logarithmic running time ( O ( log 2 P ) ) Works with non-power of two P Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM The Dissemination Algorithm A new Barrier Algorithm for InfiniBand TM The n-way Dissemination Algorithm Results and Conclusions Dissemination - Peer selection Round 0 Round 1 Round 2 speer = ( p + 2 r ) mod P rpeer = ( p − 2 r ) mod P Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM The Dissemination Algorithm A new Barrier Algorithm for InfiniBand TM The n-way Dissemination Algorithm Results and Conclusions Architectural Specialities of InfiniBand TM 1 1:n n:1 Microbenchmark LogP Prediction 1:n n:1 Benchmark Results A new Barrier Algorithm for InfiniBand TM 2 The Dissemination Algorithm The n-way Dissemination Algorithm Results and Conclusions 3 Comparison with other MPI_Barrier Implementations Conclusions and Future Work Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM The Dissemination Algorithm A new Barrier Algorithm for InfiniBand TM The n-way Dissemination Algorithm Results and Conclusions The n-way Dissemination Algorithm Round 0 Round 1 Logarithmic running time ( O ( log 2 P ) − O ( log n P )? ) Works with non-power of n P Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM The Dissemination Algorithm A new Barrier Algorithm for InfiniBand TM The n-way Dissemination Algorithm Results and Conclusions n-way Dissemination - Peer selection Round 0 Round 1 speer i = ( p + i · ( n + 1 ) r ) mod P rpeer i = ( p − i · ( n + 1 ) r ) mod P Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM Comparison with other MPI_Barrier Implementations A new Barrier Algorithm for InfiniBand TM Conclusions and Future Work Results and Conclusions Architectural Specialities of InfiniBand TM 1 1:n n:1 Microbenchmark LogP Prediction 1:n n:1 Benchmark Results A new Barrier Algorithm for InfiniBand TM 2 The Dissemination Algorithm The n-way Dissemination Algorithm Results and Conclusions 3 Comparison with other MPI_Barrier Implementations Conclusions and Future Work Torsten Hoefler n-way Dissemination
Architectural Specialities of InfiniBand TM Comparison with other MPI_Barrier Implementations A new Barrier Algorithm for InfiniBand TM Conclusions and Future Work Results and Conclusions The n-way Dissemination Algorithm Implementation Details Implementation as collv1 component in Open MPI Communication peers are precomputed Benchmark Details LAM/MPI 7.1.1 TUC SHIBA 1.0 MVAPICH 0.9.4 Torsten Hoefler n-way Dissemination
Recommend
More recommend