A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using - PowerPoint PPT Presentation

A Simulation of Global Atmosphere Model NICAM   on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan GTC2015, San Jose, Mar. 17-20, 2015

My topic The study for… • Cloud computing GTC2015, San Jose, Mar. 17-20, 2015

My topic The study for… • Computing of the cloud GTC2015, San Jose, Mar. 17-20, 2015

Clouds over the globe GTC2015, San Jose, Mar. 17-20, 2015

The first global sub-km weather simulation 20480nodes(163840cores) on the K computer Movie by R.Yoshida(RIKEN AICS) GTC2015, San Jose, Mar. 17-20, 2015

NICAM Non-hydrostatic Icosahedral Atmospheric   Model (NICAM) • Development was started since 2000   Tomita and Satoh (2005), Satoh et al. (2008, 2014) • First global dx=3.5km run in 2004 using the Earth Simulator   Tomita et al. (2005), Miura et al. (2007, Science) • First global dx=0.87km run in 2012 using the K computer   Miyamoto et al. (2014) • FVM with icosahedral grid system • Written by Fortran90 • Selected as a target application in post-K computer development   : System-Application co-design GTC2015, San Jose, Mar. 17-20, 2015

“Dynamics” and “Physics” in Weather/Climate Model • “Dynamics” : fluid dynamics solver of the atmosphere Grid method (FDM, FVM, FEM) with horizontal explicit-vertical implicit • scheme, or Spectral method • “Physics” : external forcing and sub-grid scale Cloud microphysics, atmospheric radiation, turbulence in boundary layer, • chemistry, cumulus, etc.. Parameterized, no communication, big loop body with “if” branches • Ratio in the elapsed time Efficiency/PEAK on the K computer 13% 7% Cloud Microphysics Num. filter Radiation HEVI 6% PBL 6% Tracer advection 6% other other Dynamics 5% Physics 8% 17% Physics Dynamics GTC2015, San Jose, Mar. 17-20, 2015

Issues of Weather/Climate Model & Application The Bandwidth Eater • Low computational intensity   : Using a lot of variables, low-order scheme • H uge code   : 10K~100K lines (without comments!) • Active development and integration   : Fully-tuned codes may replace by the student’s new scheme GTC2015, San Jose, Mar. 17-20, 2015

Issues of Weather/Climate Model & Application The Bandwidth Eater • It shows “Flat profile”   : No large hot-spots of computation • Frequent file I/O   : Requires the throughput from accelerator to storage disk ➡ We have to optimize everywhere in the application! GTC2015, San Jose, Mar. 17-20, 2015

Challenge to GPU computation • We want to… • Utilize memory throughput of GPU • Offload all component of the application • Keep portability of the application : one code for ES, K computer and GPU • We don’t want to… • Rewrite all component of the application by special language ➡ OpenACC is suitable for our application GTC2015, San Jose, Mar. 17-20, 2015

NICAM-DC with OpenACC • NICAM-DC: Dynamical core package of NICAM BSD 2-clause licence • From website (http://scale.aics.riken.jp/nicamdc/) or GitHub • Basic test cases are prepared • • OpenACC implementation • With the support of the specialist of NVIDIA (Mr. Naruse) • Performance evaluation on TSUBAME 2.5 (Tokyo Tech.) Largest GPU supercomputer in Japan : 1300+ nodes, 3GPUs per node • We used 2560GPUs (1280nodes x 2GPUs) for grand challenge run • GTC2015, San Jose, Mar. 17-20, 2015

NICAM-DC with OpenACC • Strategy • Transfer common variables to GPU using “data pcopyin” clause   : After the setup (memory allocation), arrays which use in the dynamical step   (e.g. stencil operator coefficient) are transferred all at once • Data layout   : Several loop kernels are reverted from Array of Structure (AoS) to Structure of Array (SoA), which is suitable for GPU computing • Asynchronous execution of loop kernels   : “async” clause is used as much as possible GTC2015, San Jose, Mar. 17-20, 2015

NICAM-DC with OpenACC • Strategy (continue) • Ignore irregular, small computation part   : Pole points are calculated on the host CPU of master rank • We don’t have to separate kernel for this: It’s advantage of OpenACC • MPI communication   : Data packing/unpacking of halo grids are processed on GPU to reduce the size of data transfer between host and device • File I/O   : Variables for output are updated in each time step on GPU • At the time to file write, the data is transferred from devise GTC2015, San Jose, Mar. 17-20, 2015

Node-to-node comparison k20x k20x k20x westmare westmare s64VIIIfx TUBAME2.5 GPU TUBAME2.5 CPU K computer 2MPI/node 8MPI/node 1MPI/node 1GPU/MPI 8thread/MPI 2620GFLOPS 102GFLOPS 128GFLOPS 500GB/s 64GB/s 64GB/s B/F=0.2 B/F=0.6 B/F=0.5 Fat-tree IB Fat-tree IB Tofu GTC2015, San Jose, Mar. 17-20, 2015

Node-to-node comparison • GPU run is 7-8x faster than CPU run   : Appropriate to the memory performance • We achieved a good performance without writing any CUDA kernels • Modified/Added lines of the code were only 5% (~2000lines) TSUBAME(ACC) TSUBAME(HOST) K Memory throughput Elapsed time [sec/step] 1.8 500GB/s 5 node x 2 PE - 2 GPU x8.3 64GB/s 15.1 5 node x 8 PE 12.2 x6.8 64GB/s 5 node x 1 PE - 8 thread GTC2015, San Jose, Mar. 17-20, 2015

Node-to-node comparison TSUBAME2.5 GPU TSUBAME2.5 CPU K computer Computational E ffi ciency 1.7 Peak perf.[%] 4.4 5.3 0 1.5 3 4.5 6 Power E ffi ciency 109 MFLOPS/W 13 42 0 30 60 90 120 GTC2015, San Jose, Mar. 17-20, 2015

Weak scaling test TSUBAME2.5 GPU (MPI = GPU = Node x 2) TSUBAME2.5 CPU (MPI = CPU = Node x 8) K CPU (MPI = Node, CPU = Node x 8) 1E+05 47TFLOPS 1E+04 Performance[GFLOPS] 1E+03 1E+02 1E+01 1E+00 1E+01 1E+02 1E+03 1E+04 Node GTC2015, San Jose, Mar. 17-20, 2015

Weak scaling test • 47TFLOPS in largest problem size • In this case, diagnostic variables were written in every 15 min. of simulation time • By selecting the typical output interval (every 3 hours = 720 steps), we achieved 60TFLOPS • File I/O is critical in production run • We can compress output data on GPU ➡ We really need GPU-optimized, popular compression library: cuHDF? transfer file (bottleneck) write GPU CPU Storage mem. mem. compression on CPU:   Format:   gzip/szip in HDF5 lib. NetCDF GTC2015, San Jose, Mar. 17-20, 2015

Weak scaling test • 47TFLOPS in largest problem size • In this case, diagnostic variables were written in every 15 min. of simulation time • By selecting the typical output interval (every 3 hours = 720 steps), we achieved 60TFLOPS • File I/O is critical in production run • We can compress output data on GPU ➡ We really need GPU-optimized, popular compression library: cuHDF? transfer file (reduced) write GPU CPU Storage mem. mem. compression on GPU:   Format:   by cuHDF? lib. NetCDF GTC2015, San Jose, Mar. 17-20, 2015

Strong scaling test TSUBAME2.5 GPU (MPI = GPU = Node x 2) TSUBAME2.5 CPU (MPI = CPU = Node x 8) K CPU (MPI = Node, CPU = Node x 8) # of horizontal   1E+05 grid 1E+04 Performance[GFLOPS] 16900 4356 1156 324 100 1E+03 ~50% of elapse time is communication 1E+02 1E+01 1E+00 1E+01 1E+02 1E+03 1E+04 Node GTC2015, San Jose, Mar. 17-20, 2015

Summary • OpenACC enables easy porting of weather/climate model to GPU • We achieved good performance and scalability with small modification • Performance of data transfer limits application performance • “Pinned memory” is effective for H-D transfer • In near future, NVLink and HBM is expected • File I/O issue is critical • More effort of application side is necessary ➡ "Precision-aware" coding, from both scientific and computational viewpoint. • Ongoing effort • OpenACC for all physics component Thank you for the attention! GTC2015, San Jose, Mar. 17-20, 2015

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using - PowerPoint PPT Presentation

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan GTC2015, San Jose, Mar. 17-20, 2015 My topic The study for Cloud computing

TSUBAME-KFC : Ultra Green Supercomputing Testbed Toshio Endo Akira Nukada, Satoshi Matsuoka

Formation of the Earth Formation of the Earth s Atmosphere s Atmosphere and Oceans and

MODIS Atmosphere Products MODIS Atmosphere Products Michael D. King Michael D. King NASA

TSUBAME---A Year Lat er Sat oshi Mat suoka, Prof essor/ Dr.Sci. Global Scient if ic I nf ormat

SIO15-SS1 20: Topic 15: The Atmosphere and Climate SIO15-SS1 20: Topic 15: The Atmosphere and

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Medical Ecology Medical Ecology Spring 2004 Spring 2004 Infectious Diseases The Atmosphere

Climate Change Chapter 22 Atmosphere The atmosphere the thin layers of gases surrounding the

10. The Earth 10.1 Earths Atmosphere 10.2 Earths Layers 10.1 Earths Atmosphere What

2.6 Case Study: Atmosphere Model In the next three sections, we develop parallel algorithms for

Temperature Structure of the Atmosphere Temperature Structure of the Atmosphere EES 3310/5310

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Numerical simulation Numerical simulation of dynamics of atmosphere of atmosphere of dynamics

Enterprise End User Summit 2012 LTTng 2.0 : Kernel and Application tracing for the Enterprise.

2017-2018 Budg 8 Budget Publi lic H Heari ring a and Bu Budget A Adopti ption Board M

phil rose Division of Humanities, Hong Kong University of Science & Technology School of

Categorizing container escape methodologies in multi-tenant environments Rik Janssen

Funding Opportunities from the Partnerships Portfolio Canadian Federation of Business School

Research Productivity of Canadian Business Schools For Discussion at CFBSD Meeting Toronto, 7

Understanding H-1B & TN Employment Status: The Basics Office of International Affairs Dan P.

DNSSEC for Legacy Applications - POC Presenter: Sara Dickinson (Sinodun) Allison Mankin, Gowri

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using - PowerPoint PPT Presentation

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan GTC2015, San Jose, Mar. 17-20, 2015 My topic The study for Cloud computing

TSUBAME-KFC : Ultra Green Supercomputing Testbed Toshio Endo Akira Nukada, Satoshi Matsuoka

Formation of the Earth Formation of the Earth s Atmosphere s Atmosphere and Oceans and

MODIS Atmosphere Products MODIS Atmosphere Products Michael D. King Michael D. King NASA

TSUBAME---A Year Lat er Sat oshi Mat suoka, Prof essor/ Dr.Sci. Global Scient if ic I nf ormat

SIO15-SS1 20: Topic 15: The Atmosphere and Climate SIO15-SS1 20: Topic 15: The Atmosphere and

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Medical Ecology Medical Ecology Spring 2004 Spring 2004 Infectious Diseases The Atmosphere

Climate Change Chapter 22 Atmosphere The atmosphere the thin layers of gases surrounding the

10. The Earth 10.1 Earths Atmosphere 10.2 Earths Layers 10.1 Earths Atmosphere What

2.6 Case Study: Atmosphere Model In the next three sections, we develop parallel algorithms for

Temperature Structure of the Atmosphere Temperature Structure of the Atmosphere EES 3310/5310

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Numerical simulation Numerical simulation of dynamics of atmosphere of atmosphere of dynamics

Enterprise End User Summit 2012 LTTng 2.0 : Kernel and Application tracing for the Enterprise.

2017-2018 Budg 8 Budget Publi lic H Heari ring a and Bu Budget A Adopti ption Board M

phil rose Division of Humanities, Hong Kong University of Science &amp; Technology School of

Categorizing container escape methodologies in multi-tenant environments Rik Janssen

Funding Opportunities from the Partnerships Portfolio Canadian Federation of Business School

Research Productivity of Canadian Business Schools For Discussion at CFBSD Meeting Toronto, 7

Understanding H-1B &amp; TN Employment Status: The Basics Office of International Affairs Dan P.

DNSSEC for Legacy Applications - POC Presenter: Sara Dickinson (Sinodun) Allison Mankin, Gowri

phil rose Division of Humanities, Hong Kong University of Science & Technology School of

Understanding H-1B & TN Employment Status: The Basics Office of International Affairs Dan P.