Hierarchy Aware Blocking and Nonblocking Collective - PowerPoint PPT Presentation

Hierarchy Aware Blocking and Nonblocking Collective Communications-The Effects of Shared Memory Communications in the Cray XT Environment Richard L. Graham, Joshua S. Ladd, Manjunath Venkata 1 Managed by UT-Battelle 1 Managed by UT-Battelle for the Department of Energy for the Department of Energy Graham_CAC_2010 Graham_CAC_2010

Acknowledgements • US Department of Energy FASTOS program 2 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Outline • Statement of the problem • Design Overview • Results • Next steps 3 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Problems being addressed • Optimization of collective operations • Implementation of extensible optimized collective operations • Implementation of nonblocking collective operations 4 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Why Optimize Collective Communications • Collective operations limit application scalability • Communication pattern involving multiple processes (in MPI, all ranks in the communicator are involved) • Optimized collectives involve a communicator-wide data-dependent communication pattern • Data needs to be manipulated at intermediate stages of a collective operation • Collective operations magnify the effects of system- noise 5 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Scalability of Collective Operations Ideal Algorithm Impact of System Noise 3'&/ : 3'&/ : ,)75(61'. ,)75(61'. 4'225.1(-61'. 4'225.1(-61'. 8*)+,)*596 8*)+,)*596 ;'1*) $ 012) 012) $ ! " # $ ! " # $ %&'()**+,-./ %&'()**+,-./ 6 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Scalability of Collective Operations - II Offloaded Algorithm Nonblocking Algorithm 3'&/ = 3'&/ = ,)75(61'. ,)75(61'. 4'225.1(-61'. 4'225.1(-61'. 8*)+,)*596 8*)+,)*596 :)9);-61'.+<;).6 :)9);-61'.+<;).6 012) $ $ 012) ! " # $ ! " # $ %&'()**+,-./ %&'()**+,-./ 7 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Mapping the collectives onto the system • Consider communication hierarchies • Schedule the network 8 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Example – 4 Process Recursive Doubling Host 1 Host 2 1 2 3 4 Step 1 1 2 3 4 Inter Host Step 2 Communication 1 2 3 4 9 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Example – 4 Process Recursive Doubling – On host optimization Host 1 Host 2 1 2 3 4 Step 1 1 2 3 4 Inter Host Step 2 Communication 1 2 3 4 Step 3 1 2 3 4 10 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Design strategy • Decouple – Hierarchy detection – Network specific collective algorithm implementation (“single” level) – Full collective function implementation (hierarchical) – Basic building blocks from MPI level functions • Share resources between levels w/o breaking the abstraction between layers 11 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Collectives – Software Layers OMPI Module Component Architecture Collective Framework Basic Collectives (bcol) Framework Subgroup Framework SM NUMA MUMA IBNET Pt2Pt ML – Hierarchical Tuned (pt2pt) IB Collectives Comp. Collectives Comp. OFFLOAD MLNX OFED 12 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Benchmarks 13 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

System setup • Jaguar • 2.6 GHz Istanbul processor • Dual socket • Hex-core • Smoky – 2.0 GHz Opteron – Quad socket – Quad core 14 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Barrier as a function of Process count – Jaguar – 2 Level hierarchy 9 Shared Memory pt-2-pt 8 Latency of the Barrier (usecs) 7 6 5 4 3 2 1 0 2 4 6 8 10 12 Processes 15 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Barrier as a function of Process count – Smoky – 2 Level hierarchy 12 Shared Memory pt-2-pt Latency of the Barrier (usecs) 10 8 6 4 2 0 2 4 6 8 10 12 14 16 Processes 16 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Barrier As a function of number of sockets - Jaguar 2 Latency of the Barrier (usecs) Processes on Same Socket 1.5 Processes on Different Sockets 1 0.5 0 2 4 Processes 17 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Barrier As a function of number of sockets (1,2) – Smoky 2 Latency of the Barrier (usecs) Processes on Same Socket 1.5 Processes on Different Sockets 1 0.5 0 2 4 Processes 18 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Barrier As a function of number of sockets (1,4) – Smoky 2 Latency of the Barrier (usecs) Message Traffic within Socket Message Traffic between Sockets 1.5 1 0.5 0 4 Processes 19 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Summary • Added hardware support for offloading collective operations • Developed MPI-level support for asynchronous collectives • Good barrier performance • Good overlap capabilities • Work is continuing 20 Managed by UT-Battelle for the Department of Energy Graham_CAC_2010

Hierarchy Aware Blocking and Nonblocking Collective - PowerPoint PPT Presentation

Hierarchy Aware Blocking and Nonblocking Collective Communications-The Effects of Shared Memory Communications in the Cray XT Environment Richard L. Graham, Joshua S. Ladd, Manjunath Venkata 1 Managed by UT-Battelle 1 Managed by

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

Delay Aware Packet Scheduling (DAPS) and receivers buffer blocking in CMT-SCTP Nicolas KUHN 1 ,

Nonblocking commit protocols Dale Skeen, SIGMOD81 Jingchao Fang, Zhuoer Tong Abstract

Implementation and Analysis of Nonblocking Collective Operations on SCI Networks Christian Kaiser

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

Hierarchy of School Marketing Needs Leadership Day - February 16, 2018 Maslows Hierarchy of

Extensions of the Caucal Hierarchy? Pawe Parys University of Warsaw LATA 2019 Caucal

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Efficient ion blocking in gaseous detectors Efficient ion blocking in gaseous detectors and its

Blocking and Non-blocking Checkpointing and Rollback Recovery for Networks-on-Chip Claudia Rusu 1

Dynamic Blocking Problems for Models of Fire Propagation Alberto Bressan Department of

Blocking in the 2 k Design Blocking may be required because: we cannot perform all required runs

Lowering the Overhead of Nonblocking Software Transactional Memory Virendra J. Marathe Michael

Wren: Nonblocking Reads in a Partitioned Transactional Causally Consistent Data Store Kristina

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Visual preconditioning reduces emergence delirium in children undergoing ophthalmic surgery

Middleware Challenges Ahead Kurt Geihs, Goethe University Presented by Eric Leshay CS 525M

Aedes Aegypti control experiences and challenges Fabiano Geraldo Pimenta Jnior

Reminder CS 188: Artificial Intelligence Only a very small fraction of AI is about making

Communication Issues between the Splunk universal forwarder and the Splunk server 1. As a first

Why (Special Agent) Johnny (Still) Cant Encrypt: A Security Analysis of the APCO Project 25

Smith and Rawls Share a Room: Stability and Medians Bettina Klaus and Flip Klijn Maastricht

Analyzing & Mitigating Malicious Web Activity using Splunk Enterprise #splunkconf StubHub

Hierarchy Aware Blocking and Nonblocking Collective - PowerPoint PPT Presentation

Hierarchy Aware Blocking and Nonblocking Collective Communications-The Effects of Shared Memory Communications in the Cray XT Environment Richard L. Graham, Joshua S. Ladd, Manjunath Venkata 1 Managed by UT-Battelle 1 Managed by

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

Delay Aware Packet Scheduling (DAPS) and receivers buffer blocking in CMT-SCTP Nicolas KUHN 1 ,

Nonblocking commit protocols Dale Skeen, SIGMOD81 Jingchao Fang, Zhuoer Tong Abstract

Implementation and Analysis of Nonblocking Collective Operations on SCI Networks Christian Kaiser

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

Hierarchy of School Marketing Needs Leadership Day - February 16, 2018 Maslows Hierarchy of

Extensions of the Caucal Hierarchy? Pawe Parys University of Warsaw LATA 2019 Caucal

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Efficient ion blocking in gaseous detectors Efficient ion blocking in gaseous detectors and its

Blocking and Non-blocking Checkpointing and Rollback Recovery for Networks-on-Chip Claudia Rusu 1

Dynamic Blocking Problems for Models of Fire Propagation Alberto Bressan Department of

Blocking in the 2 k Design Blocking may be required because: we cannot perform all required runs

Lowering the Overhead of Nonblocking Software Transactional Memory Virendra J. Marathe Michael

Wren: Nonblocking Reads in a Partitioned Transactional Causally Consistent Data Store Kristina

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Visual preconditioning reduces emergence delirium in children undergoing ophthalmic surgery

Middleware Challenges Ahead Kurt Geihs, Goethe University Presented by Eric Leshay CS 525M

Aedes Aegypti control experiences and challenges Fabiano Geraldo Pimenta Jnior

Reminder CS 188: Artificial Intelligence Only a very small fraction of AI is about making

Communication Issues between the Splunk universal forwarder and the Splunk server 1. As a first

Why (Special Agent) Johnny (Still) Cant Encrypt: A Security Analysis of the APCO Project 25

Smith and Rawls Share a Room: Stability and Medians Bettina Klaus and Flip Klijn Maastricht

Analyzing &amp; Mitigating Malicious Web Activity using Splunk Enterprise #splunkconf StubHub

Analyzing & Mitigating Malicious Web Activity using Splunk Enterprise #splunkconf StubHub