Combining Machine Learning and Numerical Modeling to Transform - PowerPoint PPT Presentation

Combining Machine Learning and Numerical Modeling to Transform Atmospheric Science Dr. Richard Loft* Director, Technology Development Computational and Information Systems Laboratory National Center for Atmospheric Research *with special thanks to Dr. Raghu Kumar, NVIDIA; Supreeth Suresh, NCAR; the PGI team; and students and faculty at the University of Wyoming GTC San Jose, CA March 19, 2018 Shortened presentation title Shortened presentation title Combining numerical modeling and ML

Talk Summary • Science 3.0: HPC + ML – Apply GPUs to accelerate models where physics is rigorous. – Replace parameterizations with Machine Learning emulators where the physics is phenomenological. • Initial results are encouraging… • But much more work needs to be done to prove these ideas out! Shortened presentation title Shortened presentation title Combining numerical modeling and ML 2

What’s driving future of prediction? ESP! Then: • – Weather prediction(5-10 days) – GAP – Climate projections (decades-centuries) Divisions between meteorology and climate are breaking • down! – Discoveries of predictability driven by the ocean and land surface Now: Earth System Prediction (ESP) filling that GAP • – Sub-seasonal (Weeks) – Seasonal (Months) – Climate predictions (years to decades) Making these predictions will require significantly more • computing power. Shortened presentation title Shortened presentation title Combining numerical modeling and ML 3

Earth System Modeling Catch 22 Due to insufficient computing power ESMs can’t resolve • key phenomena. Scientists try to describe the unresolved scales using • human-crafted physics parameterizations. ESM’s software complexity grows, driven by the • increasing complexity of these parameterizations. Growing architectural complexity hinders the ability to • port and optimize ESM codes on new architectures. Due to insufficient computing power ESMs can’t resolve • key phenomena. Shortened presentation title Shortened presentation title Combining numerical modeling and ML

Model for Prediction Across Scales - Atmosphere (MPAS-A) A Global Meteorological Model & Future ESP Component Simulation of 2012 Tropical Cyclones at 4 km resolution – Courtesy of Falko Judt, NCAR Shortened presentation title Shortened presentation title Combining numerical modeling and ML 5

MPAS: the algorithmic description • Fully compressible non-hydrostatic equations written in flux form • Finite Volume Method on staggered grid – The horizontal momentum normal to the cell edge (u) is sits at the cell edges . – Scalars sit at the cell centers • Split-Explicit timestepping scheme – Time integration 3 rd order Runge-Kutta – Fast horizontal waves are sub-cycled Shortened presentation title Shortened presentation title Combining numerical modeling and ML 6

3/19/2019 UCAR CONFIDENTIAL MPAS Grids… Horizontal Vertical Sneaky Local Refinement pentagons Shortened presentation title Shortened presentation title Combining numerical modeling and ML 7

Parallel Decomposition via Metis Shortened presentation title Shortened presentation title Combining numerical modeling and ML 8

3/19/2019 UCAR CONFIDENTIAL MPAS Time-Integration Design There are ~350 halo exchanges /timestep! Shortened presentation title Shortened presentation title Combining numerical modeling and ML 9

3/19/2019 UCAR CONFIDENTIAL Physics (Called before dynamics) Shortened presentation title Shortened presentation title Combining numerical modeling and ML 10

3/19/2019 UCAR CONFIDENTIAL Microphysics (called after dynamics) Shortened presentation title Shortened presentation title Combining numerical modeling and ML 11

MPAS: The Code inventory MPAS Component SLOC Where it runs Dynamics 10,000 GPU Radiative Transport 37,000 CPU Land Surface Model 21,000 CPU Other physics 42,000 GPU Total 110,000 Shortened presentation title Shortened presentation title Combining numerical modeling and ML 12

Goals of MPAS-GPU Portability Project Achieve portability across CPU and GPU architectures • without sacrificing CPU performance Minimize use of architecture-specific code: • #ifdef _GPU_ : #endif Manage porting/optimization costs • – Use OpenACC to enable CPU-GPU portability Use all the hardware (CPU & GPU) available • – After all we paid for it! Part of our team: UW students and PGI experts. Shortened presentation title Shortened presentation title Combining numerical modeling and ML 13

Scaling Benchmark Test Systems Test case: MPAS-A dry dynamical core • System 1: IBM “ WSC” supercomputer • – AC922 node with 6, 16 GB V100 GPUs; – 2x 22-core IBM Power-9 CPUs; – Compiler: PGI 18.10 – 2x IB interconnect; IBM Spectrum MPI System 2: NVIDIA “Prometheus” supercomputer • – DGX-1 node with 8, 16 GB V100 GPUs; – 2x 18-core Intel Xeon v4 (BWL) CPUs; – Compiler: PGI 18.10 – 4x IB interconnect; OpenMPI 3.1.3 System 3: NCAR Cheyenne supercomputer • – 2x 18-core Intel Xeon v4 (BWL) – Intel compiler 17.0.1 – 1x EDR IB interconnect; HPE MPT 2.16 MPI Shortened presentation title Shortened presentation title Combining numerical modeling and ML 14

Strong Scaling V100 vs v4 Xeon at 10 km and 15 km Strong Scaling MPAS-A Dynamical Core (56 levels, SP) at 10 km and 15 km 10 Xeon v4 nodes (15 km) 8xV100 DGX1 (15 km) 6xV100 AC922 (15 km) Xeon v4 nodes (10 km) 8xV100 DGX1 (10 km) Sec/step 1 0.1 8 16 32 64 128 256 Number of GPUs or dual socket CPU nodes Shortened presentation title Shortened presentation title Combining numerical modeling and ML

GPU speed relative to dual socket Intel Xeon v4 nodes 8xV100 DGX-1 performance relative v4 node at 10 km and 15 km 3.5 Ratio of CPU to GPU performance (sec/tstep) 3 2.5 2 1.5 15 km v4 nodes/V100 1 10 km v4 nodes/V100 0.5 0 0 20 40 60 80 100 120 Number of GPUs or dual socket CPU Nodes Shortened presentation title Shortened presentation title Combining numerical modeling and ML

Weak scaling of MPAS-A dry dycore (56 level, SP) on GPUs MPAS-A Dry Dynamics: Weak-Scaling (80k pts/GPU, SP, 56 levels) 0.4 0.35 0.3 Seconds/time step 0.09 sec 0.25 MPI overhead 0.2 0.15 6xV100 AC922 (40kpts) 0.1 6xV100 AC922 (80kpts) 8xV100 DGX1 (40kpts) 0.05 8xV100 DGX1 (80kpts) 0 0 20 40 60 80 100 120 140 Number of GPUs Shortened presentation title Shortened presentation title Combining numerical modeling and ML

Optimizing MPAS-A dynamical core: Lessons Learned Module level allocatable variables (20 in number) were • unnecessarily being copied by compiler from host to device to initialize them with zeroes. Moved the initialization to GPUs. dyn_tend: eliminated dynamic allocation and deallocation of • variables that introduced H<- >D data copies. It’s now statically created. MPAS_reconstruct: originally kept on CPU was ported to GPUs. • MPAS_reconstruct: mixed F77 and F90 array syntax caused • compiler to serialize the execution on GPUs. Rewrote with F90 constructs. Printing out summary info (by default) for every timestep • consumed time. Turned into debug option. Shortened presentation title Shortened presentation title Combining numerical modeling and ML 18

Improving MPAS-A halo exchange performance: coalescing kernels Shortened presentation title Shortened presentation title Combining numerical modeling and ML 19 Coalescing these 9 kernels should drop MPI overhead by 50%

Overlapping Radiation Calculation: Process Layout (Example) Proc 0 MPI & NOAH control path CPU – SW/LW Rad & NOAH GPU – everything else Proc 1 Asynch I/O process Idle processor Node Shortened presentation title Shortened presentation title Combining numerical modeling and ML

Co-locating radiation and integration tasks Distribution of times to transfer general physics input fields from integration to radiation tasks for the 60-km uniform mesh on Cheyenne. 576 total tasks (16 nodes x 36 cores) 352 integration tasks 224 radiation tasks Shortened presentation title Shortened presentation title Combining numerical modeling and ML

Projected full MPAS-A model performance MPAS-A estimated timestep budget for 40k pts per GPU dynamics (dry) 0.018 sec dynamics (moist) physics radiation comms 0.06 sec halo comms 0.139 sec 0.003 sec 0.085 sec 0.03 sec Total time: 0.275 sec/step 15 km -> 64 V100 GPUs Throughput ~0.9 years/day Shortened presentation title Shortened presentation title Combining numerical modeling and ML 22

Debugging MPAS-A: Tools SLOW and WRONG FAST and RIGHT FAST and WRONG CPU and RIGHT PCAST: • – When do results first begin to differ between CPU and GPU? MPAS Validation Tool • When is different still right? – Shortened presentation title Shortened presentation title Combining numerical modeling and ML 23

Combining Machine Learning and Numerical Modeling to Transform - PowerPoint PPT Presentation

Combining Machine Learning and Numerical Modeling to Transform Atmospheric Science Dr. Richard Loft* Director, Technology Development Computational and Information Systems Laboratory National Center for Atmospheric Research *with special

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

PAGE 1 FIRST AN ADVERTISEMENT Neil Grossbard(x6356),proficient in numerical analysis and

Neu NeuroLink A Brain Computer Interface for ALS Patients by MindMeld Team MindMeld Andrew

Design of Dottorando Ing.Davide Microwave Absorbing Structure Micheli and Microwave Shielding

Developing an all-electric power take off for Wave Energy Converters Dr. S.P. McDonald Dr. N.

High Reynolds Number Computational Aero-Optics Edwin Mathews Kan Wang, Meng Wang, Eric Jumper

Language & Brain Chomsky 1959 Why bother? What

Analysis of Structures and Thermomechanics for Studies & Research www.code-aster.org

AUV Dynamics A u t o n o m o u s U n d e r w a t e r V e h i c l e ( A

Combining Machine Learning and Numerical Modeling to Transform - PowerPoint PPT Presentation

Combining Machine Learning and Numerical Modeling to Transform Atmospheric Science Dr. Richard Loft* Director, Technology Development Computational and Information Systems Laboratory National Center for Atmospheric Research *with special

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

PAGE 1 FIRST AN ADVERTISEMENT Neil Grossbard(x6356),proficient in numerical analysis and

Neu NeuroLink A Brain Computer Interface for ALS Patients by MindMeld Team MindMeld Andrew

Design of Dottorando Ing.Davide Microwave Absorbing Structure Micheli and Microwave Shielding

Developing an all-electric power take off for Wave Energy Converters Dr. S.P. McDonald Dr. N.

High Reynolds Number Computational Aero-Optics Edwin Mathews Kan Wang, Meng Wang, Eric Jumper

Language &amp; Brain Chomsky 1959 Why bother? What

Analysis of Structures and Thermomechanics for Studies &amp; Research www.code-aster.org

AUV Dynamics A u t o n o m o u s U n d e r w a t e r V e h i c l e ( A

Language & Brain Chomsky 1959 Why bother? What

Analysis of Structures and Thermomechanics for Studies & Research www.code-aster.org