An Agile Approach to Building a GPU-enabled and Performance- - PowerPoint PPT Presentation

An Agile Approach to Building a GPU-enabled and Performance- portable Global Cloud-resolving Atmospheric Model Dr. Richard Loft* Director, Technology Development CISL/NCAR *National Center for Atmospheric Research GTC, San Jose, CA March 26, 2018

Outline • Origins Backstory • The MPAS Model • Team • Tools and Design • Status 2

Project began with research based on student projects • Two years of student internship projects in the Summer Internships in Parallel Computational Science (SIParCS) at NCAR funded student projects related to architectural inter-comparison. • Projects focused on optimizing atmospheric numerical PDE solvers for both CPUs and GPUs with performance portability in mind. • Architectures compared: o Xeon Broadwell, Haswell; o Xeon Phi KNL; o NVIDIA Tesla P100->V100. 3

Optimizing Stencils for different architectures Benchmark Problem • Shallow Water Equations (SWE) – A set of non-linear partial differential equations (PDE) – Capture features of atmospheric flow around the Earth • Radial basis function-generated finite difference (RBF-FD) methods Cone-shaped mountain Evaluate differential Stencil points operator D at every point Non-stencil points Day 1 Day 15 RBF-FD solution to SWE test case “ Flow over an isolated An example of 75-point stencil 4 mountain ” using 655,532 points [1] on a sphere [1] 3

Directive-based portability in the RBF-FD shallow water equations (2-D unstructured stencil) 350 • CI roofline model generally 300 predicts performance well, even for more complicated algorithms. Performance (GFLOPS) 250 200 • Xeon performance crashes to DRAM BW limit when cache size is 150 exceeded, with some state reuse. 100 • Xeon Phi (KNL) HBM memory is 50 less sensitive to problem size that Xeon, saturates with CI figure. 0 • NVIDIA Pascal P100 performance fits CI model GPU’s require higher levels of parallelism to reach saturation. Broadwell KNL P100 Insufficient Sufficient Workload Workload 5 Parallelism Parallelism

What is MPAS? – The Model for Prediction Across Scales NCAR’s Global Meteorological/Climate Model; ~100,000 SLOC Simulation of 2012 Tropical Cyclones at 4Km Resolution – Courtesy of Falko Judt, NCAR 6

Weather and Climate Alliance (WACA): • NCAR • NVIDIA Corporation • IBM Corporation/The Weather Company • University of Wyoming, CE&EE Department • Korean Institute of Science and Technology Information (KISTI) 7

Initial Divide and Conquer Strategy MPAS Dynamics MPAS Physics Ideas and Results Problem Reports and Support 8

Weather and Climate Alliance (WACA): A Collaboration for Earth System Model Acceleration • NCAR (2+4) Dr. Rich Loft, Director TDD o Dr. Raghu Raj Kumar, Project Scientist TDD o Clint Olson, TDD o Bill Skamarock, Senior Science, MMM o Michael Duda, Software Engineer, MMM o Dave Gill, Software Engineer, MMM o • KISTI (2+1) Minsu Joh, KISTI Director, Disaster Management Research Center o Dr. Ji-Sun Kang. Senior Researcher o Jae-Youp Kim, GRA o • NVIDIA/PGI (1+3) Greg Branch, NVIDIA, Sales o Dr. Carl Ponder, Senior Applications Engineer o Brent Leback, PGI Compiler Engineering Manager o Craig Tierny, Solutions Architect o • University of Wyoming (1+5) Dr. Suresh Muknahallipatna, Professor E&CE, UW o Supreeth Suresh, Pranay Reddy, Sumathi Lakshmiranganathan, Cena Miller, Bradley Riotto - GRAs o ~6 PI +13 technical staff Started in September 2016 (18 months) ~9 FTE-years 9

Since September: added IBM and The Weather Company IBM/TWC participants (1+2) Jaime Moreno o Todd Hutchinson o Constantinos Evangelinos Problem Reports and Support o 10

Tools for Accelerating Code Optimization • Kernel GENerator (KGEN) o Extracts kernels from Fortran applications o Creates: • Standalone source code • Input and output state for verification • Added support for code coverage and representation o Broad user community • 8 Domestic institutions • 5 international institutions • 1 Company o Available on Github: KGEN is a useful tool for accelerating code porting and optimization https://github.com/NCAR/KGen 11

MPAS Synchronous and Asynchronous Execution Dynamics LW and SW Asynch LW and SW or or and Physics Radiation I/O Radiation 𝛦 t Land Surface : : LW and SW Dynamics LW and SW or or Disk Radiation and Physics Radiation Land Surface

Phase 2: pushing on to a full MPAS port • Status of GPU-based model components o Ported, optimized, verified • Dry dynamical core • GPU-direct implementation of MPAS halo exchanges Ported, optimized o • Moist dynamics (tracer transport) • Xu-Randall Cloud fraction o Ported, undergoing optimization • WSM6 Microphysics • YSU Boundary layer scheme o Awaiting Porting • Scale Insensitive Tiedtke convection scheme • Monin-Obukhov surface layer scheme • CPU-based components o Overlapping SW and LW RRTMG Radiation (lagged radiation) o NOAH Land Surface Model (synchronous, remains on CPU) o SIONlib I/O subsystem 13

IBM/TWC MPAS Objectives • MPAS grid with local refinement 24-hour global forecasts • 12 km global grid • 3 km refinement over selected regions. • 32.8 M horizontal points 56 layers • Forecast requirement • Complete 20 hour simulation • …in 45 minutes • xRe = 26.7 • For 𝛦 t = 18 sec, timestep Refined grids can be generated budget is 0.674 seconds anywhere desired. Dr. Kumar will show next that as few as 800 V100s could achieve this goal… 14

An Agile Approach to Building a GPU-enabled and Performance- - PowerPoint PPT Presentation

An Agile Approach to Building a GPU-enabled and Performance- portable Global Cloud-resolving Atmospheric Model Dr. Richard Loft* Director, Technology Development CISL/NCAR *National Center for Atmospheric Research GTC, San Jose, CA March 26,

agile CMMI CMMI agile agile Process Innovation at the Speed Speed of Life of Life Process

The AGILE Data Center and the First AGILE Catalog Carlotta Pittori, on behalf of the AGILE

Corin Lucey Agile Lead Scaling Agile at HomeNet Who is HomeNet? Our Agile Landscape

Agile Unified Process (UP): Agile Process Overview Introduction to an OOA/D Agile Unified

Agile for the Government Product Owner Agile Government Leadership Outcomes for today

Duke Workshop PnT Agile Practice pnt-agile@redhat.com RED HAT CONFIDENTIAL - INTERNAL USE ONLY

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Agile Development emmet labs David Verba david@adaptivepath.com Agile Introduction Agile

Agile Software Development 1 / 26 Agile Processes Agile Manifesto (http://agilemanifesto.org/)

Leveraging Kanban to Create the Agile Organization As Agile is more accepted as the way to

S t r e t c h i n g the Agile Envelope Agile Software Development Meets Corporate Deployment

Agile Software Development Venkat Subramaniam svenkat@cs.uh.edu Agile Software Development - 1

Bibliography [Agile Alliance2001] Agile Alliance, Principles: The Agile Alliance (2001),

Part 6: Advanced/supervisory control layer Skogestad procedure for control structure design

ChicagoLand Glider Council Soaring Weather and Data Analysis Soaring Weather and Data Analysis

Design and Modeling of a Successive Approximation ADC for the Electrostatic Harvester of

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and

The Project-And-Lift Algorithm for the Computation of Toric Gr obner Bases An Implementation

EEE20B EEE20B-Temperature Dependent Electrical Performance of GaN High Electron Mobility

The Aemes Biorefinery Advanced Renewable Fuels and Chemicals

Deep Learning with Limited Numerical Precision Suyog Gupta SUYOG @ US . IBM . COM Ankur Agrawal

An Agile Approach to Building a GPU-enabled and Performance- - PowerPoint PPT Presentation

An Agile Approach to Building a GPU-enabled and Performance- portable Global Cloud-resolving Atmospheric Model Dr. Richard Loft* Director, Technology Development CISL/NCAR *National Center for Atmospheric Research GTC, San Jose, CA March 26,

agile CMMI CMMI agile agile Process Innovation at the Speed Speed of Life of Life Process

The AGILE Data Center and the First AGILE Catalog Carlotta Pittori, on behalf of the AGILE

Corin Lucey Agile Lead Scaling Agile at HomeNet Who is HomeNet? Our Agile Landscape

Agile Unified Process (UP): Agile Process Overview Introduction to an OOA/D Agile Unified

Agile for the Government Product Owner Agile Government Leadership Outcomes for today

Duke Workshop PnT Agile Practice pnt-agile@redhat.com RED HAT CONFIDENTIAL - INTERNAL USE ONLY

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Agile Development emmet labs David Verba david@adaptivepath.com Agile Introduction Agile

Agile Software Development 1 / 26 Agile Processes Agile Manifesto (http://agilemanifesto.org/)

Leveraging Kanban to Create the Agile Organization As Agile is more accepted as the way to

S t r e t c h i n g the Agile Envelope Agile Software Development Meets Corporate Deployment

Agile Software Development Venkat Subramaniam svenkat@cs.uh.edu Agile Software Development - 1

Bibliography [Agile Alliance2001] Agile Alliance, Principles: The Agile Alliance (2001),

Part 6: Advanced/supervisory control layer Skogestad procedure for control structure design

ChicagoLand Glider Council Soaring Weather and Data Analysis Soaring Weather and Data Analysis

Design and Modeling of a Successive Approximation ADC for the Electrostatic Harvester of

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and

The Project-And-Lift Algorithm for the Computation of Toric Gr obner Bases An Implementation

EEE20B EEE20B-Temperature Dependent Electrical Performance of GaN High Electron Mobility

The Aemes Biorefinery Advanced Renewable Fuels and Chemicals

Deep Learning with Limited Numerical Precision Suyog Gupta SUYOG @ US . IBM . COM Ankur Agrawal

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team