Automating Topology Aware Mapping for Supercomputers Abhinav - PowerPoint PPT Presentation

Automating Topology Aware Mapping for Supercomputers Abhinav Bhatele, Gagan Gupta Laxmikant V. Kale 1 1

Application Topologies Patch Compute Proxy � � �� 2 2

Interconnect Topologies • Three dimensional meshes • 3D Torus: Blue Gene/L, Blue Gene/P , Cray XT4/5 • Trees • Fat-trees (Infiniband) and CLOS networks (Federation) • Dense Graphs • Kautz Graph (SiCortex), Hypercubes • Future Topologies? • Blue Waters, Blue Gene/Q 3 3

The Mapping Problem • Applications have a communication topology and processors have an interconnect topology • Definition: Given a set of communicating parallel “entities”, map them on to physical processors to optimize communication • Goals: • Balance computational load • Minimize communication traffic and hence contention 4 4

Scope of this work • Currently we are focused on 3D mesh/torus machines • For certain classes of applications Computation Communication bound bound Latency tolerant Latency sensitive 5 5

Application specific mapping OpenAtom Default Time per step (s) 0.3 Topology 0.225 0.15 0.075 0 512 1024 2048 4096 8192 Number of cores A. Bhatele, E. Bohm, and L. V. Kale. A Case Study of Communication A. Bhatele, L. V. Kale and S. Kumar, Dynamic Topology Aware Load Optimizations on 3D Mesh Interconnects. In Euro-Par, LNCS 5704, pages Balancing Algorithms for Molecular Dynamics Applications, In 23rd ACM 1015–1028, 2009. Distinguished Paper Award. International Conference on Supercomputing (ICS), 2009. 6 6

Application specific mapping OpenAtom NAMD Outer Brick Patch 2 Inner Brick Patch 1 Time per step (ms) Default Time per step (s) 0.3 15 Topology Oblivious Topology TopoAware Patches 0.225 11.25 TopoAware LDBs 0.15 7.5 0.075 3.75 0 0 512 1024 2048 4096 8192 512 1024 2048 4096 8192 16384 Number of cores Number of cores A. Bhatele, E. Bohm, and L. V. Kale. A Case Study of Communication A. Bhatele, L. V. Kale and S. Kumar, Dynamic Topology Aware Load Optimizations on 3D Mesh Interconnects. In Euro-Par, LNCS 5704, pages Balancing Algorithms for Molecular Dynamics Applications, In 23rd ACM 1015–1028, 2009. Distinguished Paper Award. International Conference on Supercomputing (ICS), 2009. 6 6

Automatic Mapping • Obtaining the processor topology and the application communication graph • Pattern matching to identify regular patterns • 2D/3D near-neighbor communication • A suite of heuristics: the right strategy invoked depending on the communication scenario: • Regular communication • Irregular communication 7 7

Topology Discovery • Topology Manager API: for 3D interconnects (Blue Gene, XT) • Information required for mapping: • Physical dimensions of the allocated job partition • Mapping of ranks to physical coordinates and vice versa • On Blue Gene machines such information is available and the API is a wrapper • On Cray XT machines, jump several hoops to get this information and make it available through the same API http://charm.cs.uiuc.edu/~bhatele/phd/TopoMgrAPI.tar.gz 8 8

Application communication graph • Several ways to obtain the graph • MPI applications: • Graph obtained from a run can only be used in a subsequent run • Profiling tools (IBM’s HPCT tools) • Charm++ applications: • Instrumentation at runtime • Enables dynamic mapping for changing communication graphs 9 9

Pattern Matching • We want to identify simple communication patterns 0 Processors Pattern matching to identify simple communication patterns such as 2D/3D near-neighbor graphs 31 10 10

Communication Graphs • Regular communication: • POP (Parallel Ocean Program): 2D Stencil like computation • WRF (Weather Research and Forecasting model): 2D Stencil • MILC (MIMD Lattice Computation): 4D near-neighbor • Irregular communication: • Unstructured mesh computations: FLASH, CPSD code • Many other classes of applications 11 11

Mapping Regular Graphs • Maximum Overlap (MXOVLP) Object Graph: 7 x 4 Processor Graph: 4 x 7 • Expand from Corner (EXCO) • Affine Mapping (AFFN) 12 12

Example Mapping Object Graph: 6 x 11 Processor Graph: 11 x 6 Aleliunas, R. and Rosenberg, A. L. On Embedding Rectangular Grids in Square Grids. IEEE Trans. Comput., 31(9):907–913, 1982 13 13

Different mapping solutions Object graph of 14 x 6 to processor graph of 7 x 12 Algorithms in order: MXOVLP , MXOV+AL, EXCO, COCE, AFFN, STEP 14 14

Evaluation Metric: Hop-bytes • Weighted sum of message sizes where the weights are the number of links traversed by each message d i = distance b i = bytes n = no. of messages • Indicator of the communication traffic and hence contention on the network • Previously used metric: maximum dilation 15 15

Evaluation 30 MXOVLP MXOV+AL Hops per processor EXCO COCE 22.5 AFFN STEP Lower Bound 15 7.5 0 14X6 to 7X12 16X16 to 8X32 27X35 to 45X21 Different mapping configurations 16 16

Results: WRF • Performance Average hops per byte per core Default improvement Topology 4 Lower Bound negligible on 256 and 512 cores 3 • On 1024 cores: 2 • Hops reduce by: 64% 1 • Time for communication reduces by 45% 0 • Performance improves 256 512 1024 2048 by 17% Number of nodes 17 17

Automating Topology Aware Mapping for Supercomputers Abhinav - PowerPoint PPT Presentation

Automating Topology Aware Mapping for Supercomputers Abhinav Bhatele, Gagan Gupta Laxmikant V. Kale 1 1 Application Topologies Patch Compute Proxy

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

IS TOPOLOGY IMPORTANT AGAIN? Effects of contention on message latencies in large supercomputers

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers Franois Tessier

Supercomputers and Supercomputers and Clusters and Clusters and Grid, Grid, Oh My! Oh My!

iToM: An Internet Topology Mapping Project Kamil Sarac (ksarac@utdallas.edu) Department of

Topological data analysis and topology-based visualization Leila De Floriani Topology-based

Advanced Texturing Environment Mapping Environment Mapping reflections Environment Mapping

Texture Mapping Texture Mapping 1 Texture Mapping Texture Mapping Motivation Motivation:

Texture Mapping Surface mapping OpenGl and Implementation Details Texture mapping Bump

Automating batch fecundity measurements Automating batch fecundity measurements using digital

REDHAT KICKSTART REDHAT KICKSTART Automating Linux Installation Automating Linux Installation

Automating the Automating the configuration of flow configuration of flow monitoring probes

Automating MySQL Deployments on Kubernetes Calin Don & Flavius Mecea Presslabs Automating

Automating Authority Work Automating authority work, or, Be your own authority control vendor

Practical Byzantine Fault Tolerance (Miguel Castro, Barbara Liskov) presented by Bjoern Doebel

Enterprise Ethereum Alliance Technical Roadmap Bob Summerwill and Shahan Khatchadourian Feb 28,

Forks and Governance November 6, 2019 guha.jayachandran@sjsu.edu What is a Fork? What is a

Betting on Consensus with Fantmette Sarah Azouvi, Patrick McCorry, Sarah Meiklejohn University

Operating Systems and Middleware Non-functional properties in Operating Systems and Middleware

Outline Motivation Opportunities and challenges O t iti d h ll Storage DepSky

Rafael Pass Based on [P-Seeman-Shelat] and [P-Shi] Traditional distributed systems: The

Cryptographic Protocols Bank executes (valid) transactions. What is a Valid Transaction

Automating Topology Aware Mapping for Supercomputers Abhinav - PowerPoint PPT Presentation

Automating Topology Aware Mapping for Supercomputers Abhinav Bhatele, Gagan Gupta Laxmikant V. Kale 1 1 Application Topologies Patch Compute Proxy

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

IS TOPOLOGY IMPORTANT AGAIN? Effects of contention on message latencies in large supercomputers

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers Franois Tessier

Supercomputers and Supercomputers and Clusters and Clusters and Grid, Grid, Oh My! Oh My!

iToM: An Internet Topology Mapping Project Kamil Sarac (ksarac@utdallas.edu) Department of

Topological data analysis and topology-based visualization Leila De Floriani Topology-based

Advanced Texturing Environment Mapping Environment Mapping reflections Environment Mapping

Texture Mapping Texture Mapping 1 Texture Mapping Texture Mapping Motivation Motivation:

Texture Mapping Surface mapping OpenGl and Implementation Details Texture mapping Bump

Automating batch fecundity measurements Automating batch fecundity measurements using digital

REDHAT KICKSTART REDHAT KICKSTART Automating Linux Installation Automating Linux Installation

Automating the Automating the configuration of flow configuration of flow monitoring probes

Automating MySQL Deployments on Kubernetes Calin Don &amp; Flavius Mecea Presslabs Automating

Automating Authority Work Automating authority work, or, Be your own authority control vendor

Practical Byzantine Fault Tolerance (Miguel Castro, Barbara Liskov) presented by Bjoern Doebel

Enterprise Ethereum Alliance Technical Roadmap Bob Summerwill and Shahan Khatchadourian Feb 28,

Forks and Governance November 6, 2019 guha.jayachandran@sjsu.edu What is a Fork? What is a

Betting on Consensus with Fantmette Sarah Azouvi, Patrick McCorry, Sarah Meiklejohn University

Operating Systems and Middleware Non-functional properties in Operating Systems and Middleware

Outline Motivation Opportunities and challenges O t iti d h ll Storage DepSky

Rafael Pass Based on [P-Seeman-Shelat] and [P-Shi] Traditional distributed systems: The

Cryptographic Protocols Bank executes (valid) transactions. What is a Valid Transaction

Automating MySQL Deployments on Kubernetes Calin Don & Flavius Mecea Presslabs Automating