piz daint piz kesch from general purpose super computing
play

Piz Daint & Piz Kesch: from general purpose super- computing to - PowerPoint PPT Presentation

Piz Daint & Piz Kesch: from general purpose super- computing to an appliance for weather forecasting Thomas C. Schulthess GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 1 Cray XC30 with Piz Daint 5272


  1. “Piz Daint” & “Piz Kesch”: from general purpose super- computing to an appliance for weather forecasting Thomas C. Schulthess GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 1

  2. Cray XC30 with 
 “Piz Daint” 5272 hybrid, GPU accelerated compute nodes Compute node: > Host: Intel Xeon E5 2670 (SandyBridge 8c) > Accelerator: NVIDIA K20X GPU (GK110) GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 2

  3. September 15, 2015 Today’s Outlook: GPU-accelerated Weather Forecasting John Russell “Piz Kesch” GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 3 3

  4. Swiss High-Performance Computing & Networking Initiative (HPCN) High-risk & high-impact projects (www.hp2c.ch) Phase II Application driven co-design 
 of pre-exascale supercomputing ecosystem Three pronged approach of the HPCN Initiative 2017 1. New, flexible, and efficient building 2016 2. Efficient supercomputers Monte Rosa 
 Pascal based hybrid 3. Efficient applications Cray XT5 2015 Phase II 14’762 cores Upgrade to 
 K20X based hybrid Cray XE6 2014 Upgrade 47,200 cores Phase I Hex-core upgrade 2013 22’128 cores Development & Aries network & multi-core procurement of 2012 petaflop/s scale supercomputer(s) 2011 2010 New building 2009 complete Begin construction 
 of new building GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 4

  5. Platform for Advanced Scientific Computing Structuring project of the Swiss University Conference (swissuniversities) Climate 5 domain science networks 1.ANSWERS > distributed application support 2.Angiogenesis 3.AV-FLOPW >20 projects 4.CodeWave 5.Coupled Cardiac Simulations see: www.pasc-ch.org 6.DIAPHANE Materials simulations 7.Direct GPU to GPU com. 8.Electronic Structure Calc. 9.ENVIRON 10.Genomic Data Processing 11.GeoPC Physics 12.GeoScale 13.Grid Tools 14.Heterogen. Compiler Platform 15.HPC-ABGEM 16.MD-based drug design 17.Multiscale applications 18.Multiscale economical data 19.Particles and fields Solid Earth Dynamics Life Sciences 20.Snowball sampling GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 5

  6. GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 6

  7. Leutwyler, D., O. Fuhrer, X. Lapillone, D. Lüthi, C. Schär, 2015: Continental-Scale Climate Simulation at Kilometer resolution. 
 ETH Zurich Online Resource, DOI: http://dx.doi.org/10.3929/ethz-a-010483656, online video: http://vimeo.com/136588806 GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 7

  8. Meteo Swiss production suite until March 30, 2016 ECMWF COSMO-7 2x per day 3x per day 72h forecast 16 km lateral grid, 91 6.6 km lateral grid, 60 layers layers COSMO-2 8x per day 24h forecast 2.2 km lateral grid, 60 layers Some of the products generate from these simulations: ‣ Daily weather forecast on TV / radio ‣ Forecasting for air traffic control (Sky Guide) ‣ Safety management in event of nuclear incidents GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 8

  9. “Albis” & “Lema”: 
 CSCS production systems for Meteo Swiss until March 2016 Cray XE6 procured in spring 2012 based on 12-core AMD Opteron multi-core processors GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 9

  10. Improving simulation quality requires higher performance – what exactly and by how much? Resource determining factors for Meteo Swiss’ simulations Current model running through spring 2016 New model starting operation on in spring 2016 COSMO-2 : 24h forecast running in 30 min. 
 COSMO-1 : 24h forecast running in 30 min. 
 8x per day 8x per day ( ~10x COSMO-2) COSMO-2E : 21-member ensemble,120h forecast 
 in 150 min., 2x per day ( ~26x COSMO-2) KENDA : 40-member ensemble,1h forecast 
 in 15 min., 24x per day ( ~5x COSMO-2) New production system must deliver ~40x the simulations performance 
 of “Albis” and “Lema” GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 10

  11. State of the art implementation 
 of new system for Meteo Swiss Albis & Lema: 3 cabinets Cray XE6 installed Q2/2012 • New system needs to be installed Q2-3/2015 • Assuming 2x improvement in per-socket performance: 
 ~20x more X86 sockets would require 30 Cray XC cabinets New system for Meteo Swiss if we build it like the German Weather Service (DWD) did theirs, or UK Met Office, or ECMWF … (30 racks XC) Current Cray XC30/XC40 platform 
 (space for 40 racks XC) Thinking inside the box is not a good option! CSCS machine room GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 11

  12. COSMO: old and new (refactored) code main (current / Fortran) main (new / Fortran) dynamics (C++) boundary stencil library physics conditions & dynamics (Fortran) physics (Fortran) 
 X86 GPU halo exchg. (Fortran) with OpenMP / OpenACC Generic Shared Comm. Infrastructure Library MPI MPI or whatever system system Used by most weather services 
 HP2C/PASC development in production (incl. MeteoSwiss until 3/2016) 
 on “Piz Daint” since 01/2014 and for as well as most HPC centres Meteo Meteo Swiss since 04/206 GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 12

  13. Piz Kesch / Piz Escha: appliance for meteorology • Water cooled rack (48U) • 12 compute nodes with • 2 Intel Xeon E5-2690v3 12 cores @ 2.6 GHz256 GB 2133 MHz DDR4 memory • 8 NVIDIA Tesla K80 GPU • 3 login nodes • 5 post-processing nodes • Mellanox FDR InfiniBand • Cray CLFS Luster Storage • Cray Programming Environment GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 13

  14. Origin of factor 40 performance improvement Performance of COSMO running on new “Piz Kesch” compared to (in Sept. 2015) (1) previous production system – Cray XE6 with AMD Barcelona (2) “Piz Dora” – Cray XE40 with Intel Haswell (E5-2690v3) • Current production system installed in 2012 • New Piz Kesch/Escha installed in 2015 • Processor performance 2.8x Moore’s Law • Improved system utilisation 2.8x • General software performance 1.7x Software 
 refactoring • Port to GPU architecture 2.3x • Increase in number of processors 1.3x • Total performance improvement ~40x • Bonus: simulation running on GPU is 3x more energy efficient compared to conventional state of the art CPU GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 14

  15. A factor 40 improvement with the same footprint Current production system: Albis & Lema New system: Kesch & Escha GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 15

  16. GPU - accelerated hybrid Xeon Phi (accelerated) Multi-core 2017+ Summit Aurora post-K Tsuname-3.0 U. Tokyo 2016 MeteoSwiss 2015 Both architecture have heterogeneous memory! 2014 2013 2012 2011 DARPA HPCS GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 16

Recommend


More recommend