Reflecting on the Goal and Baseline of Exascale Computing Thomas C. Schulthess | T. Schulthess � 1
Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b | T. Schulthess � 2
Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b | T. Schulthess � 2
Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b 1,000-fold performance improvement per decade | T. Schulthess � 2
Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b 1st application at > 1 TFLOP/s sustained 1,000-fold performance improvement per decade | T. Schulthess � 2
Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b 1st application at > 1 PFLOP/s sustained 1st application at > 1 TFLOP/s sustained 1,000-fold performance improvement per decade | T. Schulthess � 2
Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b 1st application at > 1 PFLOP/s sustained 1st application at > 1 TFLOP/s sustained KKR-CPA (MST) 1,000-fold performance improvement per decade | T. Schulthess � 2
Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b 1st application at > 1 PFLOP/s sustained 1st application at > 1 TFLOP/s sustained KKR-CPA (MST) LSMS (MST) 1,000-fold performance improvement per decade | T. Schulthess � 2
Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b 1st application at > 1 PFLOP/s sustained 1st application at > 1 TFLOP/s sustained KKR-CPA (MST) WL-LSMS (MST) LSMS (MST) 1,000-fold performance improvement per decade | T. Schulthess � 2
Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b 1st application at > 1 PFLOP/s sustained 1st application at > 1 TFLOP/s sustained KKR-CPA (MST) WL-LSMS (MST) LSMS (MST) 1,000-fold performance improvement per decade | T. Schulthess � 2
Tracking supercomputer performance over time? Linpack benchmark solves: Ax = b 1st application at > 1 PFLOP/s sustained 1st application at > 1 TFLOP/s sustained KKR-CPA (MST) WL-LSMS (MST) 1,000x perf. improv. per decade seems hold for multiple-scattering-theory(MST)- based electronic structure for materials science LSMS (MST) 1,000-fold performance improvement per decade | T. Schulthess � 2
“Only” 100-fold performance improvement in climate codes Source: Peter Bauer, ECMWF Source: Peter Bauer, ECMWF | T. Schulthess � 3
Has the efficiency of weather & climate codes dropped 10-fold every decade? | T. Schulthess � 4
Floating points efficiency dropped from 50% on Cray Y-MP to 5% on today’s Cray XC (10x in 2 decades) Source: Peter Bauer, ECMWF � 5 | T. Schulthess
Floating points efficiency dropped from 50% on Cray Y-MP to 5% on today’s Cray XC (10x in 2 decades) LSMS (MST) WL-LSMS (MST) KKR-CPA (MST) Source: Peter Bauer, ECMWF � 5 | T. Schulthess
Floating points efficiency dropped from 50% on Cray Y-MP to 5% on today’s Cray XC (10x in 2 decades) Cray Y-MP @ 300kW LSMS (MST) WL-LSMS (MST) KKR-CPA (MST) Source: Peter Bauer, ECMWF � 5 | T. Schulthess
Floating points efficiency dropped from 50% on Cray Y-MP to 5% on today’s Cray XC (10x in 2 decades) Cray XT5 @ 7MW Cray Y-MP @ 300kW LSMS (MST) WL-LSMS (MST) KKR-CPA (MST) Source: Peter Bauer, ECMWF � 5 | T. Schulthess
Floating points efficiency dropped from 50% on Cray Y-MP to 5% on today’s Cray XC (10x in 2 decades) Cray XT5 @ 7MW Cray Y-MP @ 300kW IBM P5 @ 400 kW LSMS (MST) WL-LSMS (MST) KKR-CPA (MST) Source: Peter Bauer, ECMWF � 5 | T. Schulthess
Floating points efficiency dropped from 50% on Cray Y-MP to 5% on today’s Cray XC (10x in 2 decades) Cray XT5 @ 7MW Cray Y-MP @ 300kW IBM P6 @ 1.3 MW IBM P5 @ 400 kW LSMS (MST) WL-LSMS (MST) KKR-CPA (MST) Source: Peter Bauer, ECMWF � 5 | T. Schulthess
Floating points efficiency dropped from 50% on Cray Y-MP to 5% on today’s Cray XC (10x in 2 decades) Cray XT5 @ 1.8 MW Cray XT5 @ 7MW Cray Y-MP @ 300kW IBM P6 @ 1.3 MW IBM P5 @ 400 kW LSMS (MST) WL-LSMS (MST) KKR-CPA (MST) Source: Peter Bauer, ECMWF � 5 | T. Schulthess
Floating points efficiency dropped from 50% on Cray Y-MP to 5% on today’s Cray XC (10x in 2 decades) Cray XT5 @ 1.8 MW Cray XT5 @ 7MW System size (in energy footprint) grew much faster on “Top500” systems Cray Y-MP @ 300kW IBM P6 @ 1.3 MW IBM P5 @ 400 kW LSMS (MST) WL-LSMS (MST) KKR-CPA (MST) Source: Peter Bauer, ECMWF � 5 | T. Schulthess
Source: Christoph Schär, ETH Zurich, & Nils Wedi, ECMWF | T. Schulthess � 6
Source: Christoph Schär, ETH Zurich, & Nils Wedi, ECMWF | T. Schulthess � 6
Can the delivery of a 1km-scale capability be pulled in by a decade? Source: Christoph Schär, ETH Zurich, & Nils Wedi, ECMWF | T. Schulthess � 6
Leadership in weather and climate Peter Bauer, ECMWF | T. Schulthess � 7
Leadership in weather and climate European model may be the best – but far away from sufficient accuracy and reliability! Peter Bauer, ECMWF | T. Schulthess � 7
The impact of resolution: simulated tropical cyclones 130 km 60 km 25 km Observations HADGEM3 PRACE UPSCALE, P.L. Vidale (NCAS) and M. Roberts (MO/HC) | T. Schulthess � 8
Resolving convective clouds (convergence?) Bulk convergence Structural convergence Area-averaged bulk effects upon ambient flow: Statistics of cloud ensemble: E.g., heating and moistening of cloud layer E.g., spacing and size of convective clouds Source: Christoph Schär, ETH Zurich | T. Schulthess � 9
Structural and bulk convergence (Panosetti et al. 2018) Statistics of cloud area Statistics of up- & downdrafts 0 0 10 10 8 km 4 km −1 10 −1 2 km 10 1 km 500 m −2 Factor 4 10 relative frequency relative frequency −2 10 −3 10 −3 10 grid-scale −4 10 clouds [%] −4 10 −5 71 10 64 54 −5 10 −6 10 47 43 No structural convergence Bulk statistics of updrafts converges −6 −7 10 10 −2 0 2 4 −10 −5 0 5 10 15 10 10 10 10 convective mass flux [kg m −2 s −1 ] cloud area [km 2 ] Source: Christoph Schär, ETH Zurich | T. Schulthess � 10
What resolution is needed? • There are threshold scales in the atmosphere and ocean : going from 100 km to 10 km is incremental, 10 km to 1 km is a leap. At 1km • it is no longer necessary to parametrise precipitating convection, ocean eddies, or orographic wave drag and its effect on extratropical storms; • ocean bathymetry, overflows and mixing, as well as regional orographic circulation in the atmosphere become resolved; • the connection between the remaining parametrisation are now on a physical footing. • We spend the last five decades in a paradigm of incremental advances. Here we incrementally improved the resolution of models from 200 to 20km • Exascale allows us to make the leap to 1 km. This fundamentally changes the structure of our models. We move from crude parametric presentations to an explicit, physics based, description of essential processes. • The last such step change was fifty years ago. This was when, in the late 1960s, climate scientists first introduced global climate models, which were distinguished by their ability to explicitly represent extra-tropical storms, ocean Bjorn Stevens, MPI-M gyres and boundary current. | T. Schulthess � 11
Simulation throughput: Simulate Years Per Day (SPYD) NWP Climate in production Climate spinup Simulation 10 d 100 y 5’000 y Desired wall clock time 0.1 d 0.1 y 0.5 y ratio 100 1'000 10'000 SYPD 0.27 2.7 27 | T. Schulthess � 12
Simulation throughput: Simulate Years Per Day (SPYD) NWP Climate in production Climate spinup Simulation 10 d 100 y 5’000 y Desired wall clock time 0.1 d 0.1 y 0.5 y ratio 100 1'000 10'000 SYPD 0.27 2.7 27 Minimal throughout 1 SYPD , preferred 5 SYPD | T. Schulthess � 12
Summary of intermediate goal (reach by 2021?) Horizontal resolution 1 km (globally quasi-uniform) Vertical resolution 180 levels (surface to ~100 km) Time resolution Less than 1 minute Coupled Land-surface/ocean/ocean-waves/sea-ice Atmosphere Non-hydrostatic Precision Single (32bit) or mixed precision Compute rate 1 SYPD (simulated year wall-clock day) | T. Schulthess � 13
Running COSMO 5.0 at global scale on Piz Daint Scaling to full system size: ~5300 GPU accelerate nodes available Running a near-global (±80º covering 97% of Earths surface) COSMO 5.0 simulation & IFS > Either on the hosts processors: Intel Xeon E5 2690v3 (Haswell 12c). > Or on the GPU accelerator: PCIe version ofNVIDIA GP100 (Pascal) GPU | T. Schulthess � 14
September 15, 2015 Today’s Outlook: GPU-accelerated Weather Forecasting John Russell “Piz Kesch” | T. Schulthess � 15
40 Requirements from MeteoSwiss 6x Data assimilation 35 30 25 Ensemble with multiple forecasts 24x 20 15 10 Grid 2.2 km � 1.1 km 5 10x 1 Constant budget for investments and operations | T. Schulthess � 16
Recommend
More recommend