LogP Predictions Implementation Motivation Conclusions A practical Approach to the Rating of Barrier Algorithms using the LogP Model and Open MPI Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany 24.05.2005 Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Motivation Conclusions Outline Motivation LogP Predictions 1 Implementation 2 Conclusions 3 Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Motivation Conclusions Outline Motivation LogP Predictions 1 Implementation 2 Conclusions 3 Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Motivation Conclusions Motivation optimal solution for the barrier problem barrier time complexity studies exhaustive comparison of different algorithms framework for general comparison studies Open MPI is easily extensible Question: is LogP accurate enough? Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Motivation Conclusions Problems unlimited number of architectures generic optimal solution = holy grail? definition of several constraints for a given architecure Fast Ethernet, Extreme Black Diamond Switch, 512 nodes new architectures have to be added by hand several models available -> LogP should be accurate enough Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Motivation Conclusions Principles one architecture as example easy testing of new architectures framework to implement and test new algorithms Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Motivation Conclusions Architectural Assumptions full bisectional bandwidth full duplex operation unlimited switch forwarding rate constant latency overhead bigger than gap overhead is constant ( o s = o r ) Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Conclusions Base Equations several basic equations and variables : f r = max { o r , g } f s = max { o s , g } t r = max { f r , o s + L + o r } = max { max { g , o r } , o s + L + o r } = max { g , o s + L + o r } simplifying assumptions : f r = f s = o t r = t s = 2 o + L Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Conclusions Model Predictions algorithms are divided into different complexity classes O ( P ) ⇒ Central Counter O ( n · log n P ) ⇒ Combinig Tree , f-way Tournament, MCS O ( log 2 P ) + Bcast ⇒ Tournament , BST O ( log 2 P ) ⇒ Butterfly, Pairwise Exchange, Dissemination O ( log 2 P ) within the LogP is an optimal solution prove is trivial Assumption: Dissemination Barrier should perform best Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Conclusions Example - Dissemination Barrier Step 1 [stage 0]: 1 2 4 5 0 3 Step 2 [stage 1]: 0 1 2 4 5 3 a a b b Step 3 [stage 2]: 1 2 4 5 0 3 a a b b c c d d Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Conclusions Example - Dissemination Barrier o r o r o r os os os P0 o r o r o r os os os P1 o r o r o r os os os P2 o r o r o r os os os P3 o r o r o r os os os P4 o r o r o r os os os P5 Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Conclusions Example - Dissemination Barrier o r o r o r os os os P0 o r o r o r os os os P1 o r o r o r os os os P2 os o r os o r os o r P3 o r o r o r os os os P4 o r o r o r os os os P5 = max { t r , t s } · ⌈ log 2 P ⌉ rt ( t r = max { g , o s + L + o r } ) Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Conclusions Example - Dissemination Barrier o r o r o r os os os P0 o r o r o r os os os P1 o r o r o r os os os P2 os o r os o r os o r P3 o r o r o r os os os P4 o r o r o r os os os P5 assume : o > g rt = ( 2 o + L ) · ⌈ log 2 P ⌉ Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Conclusions Benchmark Results 1400 1200 runtime in microseconds (rt) 1000 800 600 400 Dissemination rt(P) 200 0 10 20 30 40 50 60 70 # processors (P) Dissemination Barrier Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Conclusions Benchmark Results 2000 1800 1600 runtime in microseconds (rt) 1400 1200 1000 800 600 400 Tournament Barrier rt(P) 200 0 10 20 30 40 50 60 70 # processors (P) Tournament Barrier Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Conclusions Benchmark Results 4500 4000 3500 runtime in microseconds (rt) 3000 2500 2000 1500 1000 Central Counter Combining Tree (n=4) 500 Tournament Barrier Dissemination Open MPI 0 0 10 20 30 40 50 60 70 # processors (P) Algorithm Comparison Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Conclusions Benchmark Results Algorithm 128 nodes 256 nodes Central Counter 4594 . 50 µ s 4909 . 67 µ s Combining Tree 4009 . 79 µ s 4343 . 63 µ s Tournament 3642 . 54 µ s 4378 . 77 µ s Dissemination 1904 . 57 µ s 1977 . 12 µ s Open MPI 3559 . 88 µ s 4226 . 88 µ s Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Conclusions Open MPI also useable for production environments ⇒ Open MPI as MPI framework Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Conclusions Open MPI also useable for production environments ⇒ Open MPI as MPI framework User Application MPI API Run Time Environment (RTE) Modular Component Architecture (MCA) PtP Mgmt. Layer (PML) TOPO COLL PTL PTL PTL Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Conclusions Component Implementation initialization returns user-defined priority algorithm selection: 0: automatic benchmark 1: Central Counter 2: Combining Tree 3: Tournament 4: Dissemination 5: Binomial Tree 6: n-way Dissemination Checkpoint/Restart is handled by lower layers Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Conclusions Conclusions taken assumptions are valid LogP model is accurate Dissemination is optimal for given scenario different networks exhibit different behavior derivation of new algorithms for different hardware (e.g. offloading based HW) could require detailed models ⇒ general methodology for developing optimal barrier algorithms has been shown Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
LogP Predictions Implementation Conclusions Future Work new model for small messages for offloading based NICs (LoP) new barrier algorithms to support hardware parallelism simplification of the LoP model (non linear, >6 parameters) Torsten Höfler, Wolfgang Rehm TU Chemnitz, Germany Barrier Rating
Recommend
More recommend