Task mapping, job placements and routing strategies Abhinav - PowerPoint PPT Presentation

Task ¡mapping, ¡job ¡placements ¡and ¡ routing ¡strategies Abhinav ¡Bhatele Center ¡for ¡Applied ¡Scientific ¡Computing Charm++ ¡Workshop ¡ ◆ ¡April ¡30, ¡2014 LLNL: Peer-Timo Bremer, Todd Gamblin, Katherine E. Isaacs, Steven H. Langer, Kathryn Mohror, Martin Schulz Illinois: Ronak Buch, Nikhil Jain, Harshitha Menon, Laxmikant V. Kale, Michael Robson Utah: Amey Desai, Aaditya G. Landge, Valerio Pascucci Purdue: Ahmed Abdel-Gawad, Mithuna Thottethodi LBL: Brian Austin, Nicholas J. Wright This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551

Communication: the bottleneck at extreme scale Energy Time (ns) spent (pJ) Floating point operation < 0.25 30-45 Time to access DRAM 50 128 Get data from another node > 1000 128-576 P . Kogge et al., Exascale computing study: Technology challenges in achieving exascale systems, Technical Report , 2008. LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 2

Communication: the bottleneck at extreme scale • High costs for data movement in Energy Time (ns) terms of time and energy spent (pJ) Floating point operation < 0.25 30-45 Time to access DRAM 50 128 Get data from another node > 1000 128-576 P . Kogge et al., Exascale computing study: Technology challenges in achieving exascale systems, Technical Report , 2008. LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 2

Communication: the bottleneck at extreme scale • High costs for data movement in Energy Time (ns) terms of time and energy spent (pJ) • Floating point operation < 0.25 30-45 Newer platforms stressing Time to access DRAM 50 128 communication further (more Get data from another node > 1000 128-576 cores, bigger networks) P . Kogge et al., Exascale computing study: Technology challenges in achieving exascale systems, Technical Report , 2008. IBM Cray Cray Blue Gene/L 0.375 XT3 8.77 Blue Gene/P 0.375 XT4 1.36 Blue Gene/Q 0.117 XT5 0.23 Network bytes to flop ratios A. Bhatele et al., Automated mapping of regular communication graphs on mesh interconnects, Intl. Conf. on High Performance Computing (HiPC), 2010. LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 2

Communication: the bottleneck at extreme scale • High costs for data movement in Energy Time (ns) terms of time and energy spent (pJ) • Floating point operation < 0.25 30-45 Newer platforms stressing Time to access DRAM 50 128 communication further (more Get data from another node > 1000 128-576 cores, bigger networks) • P . Kogge et al., Exascale computing study: Technology challenges in achieving Imperative to minimize data exascale systems, Technical Report , 2008. movement and maximize locality IBM Cray Cray Blue Gene/L 0.375 XT3 8.77 Blue Gene/P 0.375 XT4 1.36 Blue Gene/Q 0.117 XT5 0.23 Network bytes to flop ratios A. Bhatele et al., Automated mapping of regular communication graphs on mesh interconnects, Intl. Conf. on High Performance Computing (HiPC), 2010. LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 2

TASK MAPPING LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 3

Topology aware task mapping • What is mapping - layout/placement of tasks/processes in an application on the physical interconnect • Does not require any changes to the application LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 4

Topology aware task mapping • What is mapping - layout/placement of tasks/processes in an application on the physical interconnect • Does not require any changes to the application • Goals: • Balance computational load • Minimize contention (optimize latency or bandwidth) LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 4

Maximize bandwidth? • Traditionally, research has focused on bringing tasks closer to reduce the number of hops • Minimizes latency, but more importantly link contention • For applications that send large messages this might not be optimal 1D LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 5

Maximize bandwidth? • Traditionally, research has focused on bringing tasks closer to reduce the number of hops • Minimizes latency, but more importantly link contention • For applications that send large messages this might not be optimal 1D 2D LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 5

Maximize bandwidth? • Traditionally, research has focused on bringing tasks closer to reduce the number of hops • Minimizes latency, but more importantly link contention • For applications that send large messages this might not be optimal 3D 1D 2D LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 5

Maximize bandwidth? • Traditionally, research has focused on bringing tasks closer to reduce the number of hops • Minimizes latency, but more importantly link contention • For applications that send large messages this might not be optimal 4D 3D 1D 2D LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 5

Rubik • We have developed a mapping tool focusing on: • structured applications that are bandwidth-bound, use collectives over sub-communicators • built-in operations that can increase effective bandwidth on torus networks based on heuristics • Input: • Application topology with subsets identified • Processor topology • Set of operations to perform • Output: map file for job launcher LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 6

Application example app = box([9,3,8]) # Create app partition tree of 27-task planes app.tile([9,3,1]) network = box([6,6,6]) # Create network partition tree of 27-processor cubes network.tile([3,3,3]) network.map(app) # Map task planes into cubes 216 216 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27 27 = map() network with mapped application ranks app network LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 7

Mapping pF3D • A laser-plasma interaction code used at the National Ignition Facility (NIF) at LLNL • Three communication phases over a 3D virtual topology: • Wave propagation and coupling: 2D FFTs within XY planes • Light advection: Send-recv between consecutive XY planes • Hydrodynamic equations: 3D near-neighbor exchange LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 8

Mapping pF3D • A laser-plasma interaction code used at the National Ignition Facility (NIF) at LLNL • Three communication phases over a 3D virtual topology: • Wave propagation and coupling: 2D FFTs within XY planes • Light advection: Send-recv between consecutive XY planes • Hydrodynamic equations: 3D near-neighbor exchange 2048 cores 16384 cores MPI call Total % MPI % Total % MPI % Send 4.90 28.45 23.10 57.21 Alltoall 8.10 46.94 7.30 18.07 Barrier 2.78 16.10 8.13 20.15 LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 8

Performance benefits Comparison of different mappings on 2,048 cores 20 Receive Send All-to-all Barrier 15 Time (s) 10 5 0 TXYZ XYZT tile tiltX tiltXY Mapping A. Bhatele et al. Mapping applications with collectives over sub-communicators on torus networks. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , SC '12. IEEE Computer Society, November 2012. LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 9

Performance benefits Execution time for different mappings of pF3D Comparison of different mappings on 2,048 cores 60% 1000 20 Default Map Receive Best Map Send All-to-all Barrier 800 15 Time per iteration (s) 600 Time (s) 10 400 5 200 0 0 TXYZ XYZT tile tiltX tiltXY 2048 4096 8192 16384 32768 65536 Mapping Number of cores A. Bhatele et al. Mapping applications with collectives over sub-communicators on torus networks. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , SC '12. IEEE Computer Society, November 2012. LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 9

Visualizing network traffic using Boxfish TXYZ XYZT tile tiltX tiltXY Y X Z Y X Z 76M 2M LLNL-PRES-654602 Abhinav Bhatele @ Charm++ Workshop 10

Task mapping, job placements and routing strategies Abhinav - PowerPoint PPT Presentation

Task mapping, job placements and routing strategies Abhinav Bhatele Center for Applied Scientific Computing Charm++ Workshop April 30, 2014 LLNL: Peer-Timo Bremer, Todd Gamblin,

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Improving BGP routing security Job Job S Snijders NTT / / AS AS 2 2914 job ob@ntt.net

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Scalable Routing Outline Routing Algorithms Scalability 1 Overview Forwarding vs Routing

Ad Hoc Wireless Routing CS 218- Fall 2003 Wireless multihop routing challenges Review of

Routing Algebras What are routing algebras? Created to study properties of routing protocols

Points of Pride: What we have accomplished so far! Created Job Framework 24 Job Groups/Job

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING Lecture 16 Job Shop 1. Job Shop

20% 2,000 + 20% 600 10-15% <500 Placements by Client % Placements by Function %

CCLD 363 CCLD 363 Distance Field Distance Field Education Education Placements Placements

Out of Borough Placements for Looked After Children Current Profile of Placements At the end

Interplay between routing and forwarding routing algorithm Routing Algorithms and Routing local

4.3 Routing protocols We first look at Routing Tables and routing mechanisms. A routing table has

Outline Integer Programming DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Vehicle Routing

Landmark Landmark-based routing based routing Landmark Landmark-based routing based routing

Non parametric prediction and mapping of standing Non-parametric prediction and mapping of

Mapping Science ~ History and Future Dr. Katy Brner Cyberinfrastructure for Network Science

Mapping and Fitting 1D Scattering Screens Olaf Wucknitz wucknitz@mpifr-bonn.mpg.de

Transparent Flow Mapping for NEAT Felix Weinrank, Michael Txen Department of Electrical

Ebola, Leadership, and Communication Kaci Hickox MSN/MPH, DTN, BSN MSF Ebola Unit Bo, Sierra

Viruses based on slides by Vitaly Shmatikov and Ninghui Li Malware Malicious code often

Polymorphic & Metamorphic Viruses CS4440/7440 Spring 2015 Evolution of Polymorphic Viruses

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 26 Probabilistic Prediction Also

Sambuz

Useful Links

Newsletter

Mail Us

Task mapping, job placements and routing strategies Abhinav - PowerPoint PPT Presentation

Task mapping, job placements and routing strategies Abhinav Bhatele Center for Applied Scientific Computing Charm++ Workshop April 30, 2014 LLNL: Peer-Timo Bremer, Todd Gamblin,

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Improving BGP routing security Job Job S Snijders NTT / / AS AS 2 2914 job ob@ntt.net

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Scalable Routing Outline Routing Algorithms Scalability 1 Overview Forwarding vs Routing

Ad Hoc Wireless Routing CS 218- Fall 2003 Wireless multihop routing challenges Review of

Routing Algebras What are routing algebras? Created to study properties of routing protocols

Points of Pride: What we have accomplished so far! Created Job Framework 24 Job Groups/Job

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING Lecture 16 Job Shop 1. Job Shop

20% 2,000 + 20% 600 10-15% &lt;500 Placements by Client % Placements by Function %

CCLD 363 CCLD 363 Distance Field Distance Field Education Education Placements Placements

Out of Borough Placements for Looked After Children Current Profile of Placements At the end

Interplay between routing and forwarding routing algorithm Routing Algorithms and Routing local

4.3 Routing protocols We first look at Routing Tables and routing mechanisms. A routing table has

Outline Integer Programming DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Vehicle Routing

Landmark Landmark-based routing based routing Landmark Landmark-based routing based routing

Non parametric prediction and mapping of standing Non-parametric prediction and mapping of

Mapping Science ~ History and Future Dr. Katy Brner Cyberinfrastructure for Network Science

Mapping and Fitting 1D Scattering Screens Olaf Wucknitz wucknitz@mpifr-bonn.mpg.de

Transparent Flow Mapping for NEAT Felix Weinrank, Michael Txen Department of Electrical

Ebola, Leadership, and Communication Kaci Hickox MSN/MPH, DTN, BSN MSF Ebola Unit Bo, Sierra

Viruses based on slides by Vitaly Shmatikov and Ninghui Li Malware Malicious code often

Polymorphic &amp; Metamorphic Viruses CS4440/7440 Spring 2015 Evolution of Polymorphic Viruses

Prediction and Odds 18.05 Spring 2014 January 1, 2017 1 / 26 Probabilistic Prediction Also

Sambuz

Useful Links

Newsletter

Mail Us

20% 2,000 + 20% 600 10-15% <500 Placements by Client % Placements by Function %

Polymorphic & Metamorphic Viruses CS4440/7440 Spring 2015 Evolution of Polymorphic Viruses