piz daint a modern research infrastructure for
play

Piz Daint: a modern research infrastructure for computational - PowerPoint PPT Presentation

Piz Daint: a modern research infrastructure for computational science Thomas C. Schulthess NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 1 source: A. Fichtner, ETH Zurich NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 2 Data


  1. Piz Daint: a modern research infrastructure for computational science Thomas C. Schulthess NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 1

  2. source: A. Fichtner, ETH Zurich NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 2

  3. Data from many stations and earthquakes source: A. Fichtner, ETH Zurich NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 3

  4. Very large simulations allow inverting large data sets to generate high-resolution model or earth’s mantle 20 ¡km 70 ¡km 3.55 4.60 2.80 4.05 v s ¡[km/s] v s ¡[km/s] source: A. Fichtner, ETH Zurich NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 4

  5. Ice floats! first early science result on “Piz Daint” (Joost VandeVondele, Jan. 2014) high-accuracy quantum simulation produce correct results for water NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 5

  6. Jacobs ladder of chemical accuracy (J. Perdew) Full many-body Schrödinger Equation: e 2 w ( � i , � j ) = | ⇥ r i − ⇥ r j | ( H − E ) Ψ ( ξ 1 , ..., ξ N ) = 0 � = { ⇤ r, ⇥ } N N � � 2 ✓ ◆ + 1 X X 2 m r 2 H = i + v ( ξ i ) w ( ξ i , ξ j ) 2 i =1 i 6 = j =1 Ice floats with new “rung 5” simulations using MP2-based simulations with CP2K (VandeVondele & Hutter) Ice didn’t float with previous simulations using “rung 4” hybrid functionals Kohn-Sham Equation with Local Density Approximation: � ~ 2 ✓ ◆ 2 m r 2 + v LDA ( ⇤ r ) ⇥ i ( ⇤ r ) = � i ⇥ i ( ⇤ r ) NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 6

  7. Modelling interactions between water requires quantum simulations with extreme accuracy Energy scales total energy: -76.438… a.u. hydrogen bonds: ~0.0001 a.u. � required accuracy: 99.9999% source: J. VandeVondele, ETH Zurich NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 7

  8. Pillars of the scientific method Mathematics / Simulation (1) Synthesis of models and data: recognising characteristic features of complex systems with calculations of limited accuracy (e.g. inverse problems) (2) Solving theoretical problems with high precision: complex structures emerge from simple rules (natural laws), more accurate predictions from “beautiful” theory (in the Penrose sense) Theory (models) Experiment (data) NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 8

  9. Pillars of the scientific method Mathematics / Simulation (1) Synthesis of models and data: recognising characteristic features of complex systems with calculations of limited accuracy (e.g. inverse problems) (2) Solving theoretical problems with high precision: complex structures emerge from simple rules (natural laws), more accurate predictions from “beautiful” theory Note the changing role of high-performance computing : 
 (in the Penrose sense) HPC is now an essential tool for science, used by all scientists Theory (models) Experiment (data) (for better or worse), rather than being limited to the domain of applied mathematics and providing numerical solution to theoretical problems only few understand NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 9

  10. CSCS’ new flagship system “Piz Daint,” and one of Europe’s most powerful petascale supercomputers Presently the world’s most energy efficient petascale supercomputer! NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 10

  11. – Hierarchical setup of Cray’s Cascade architecture 𝑗𝑜𝑢𝑓𝑠 ¡𝑕𝑠𝑝𝑣𝑞 ¡𝑑𝑏𝑐𝑚𝑓𝑡 ¡ ≤ ¡𝑕𝑚𝑝𝑐𝑏𝑚 ¡𝑑𝑏𝑐𝑚𝑓𝑡 ¡𝑞𝑓𝑠 ¡𝑕𝑠𝑝𝑣𝑞 – 𝑕𝑠𝑝𝑣𝑞𝑡 − 1 Chassis with 16 blades Blade with 4 dual socket nodes 𝑗𝑜𝑢𝑓𝑠 ¡𝑕𝑠𝑝𝑣𝑞 ¡𝑑𝑏𝑐𝑚𝑓𝑡 ¡ ≤ ¡𝑕𝑚𝑝𝑐𝑏𝑚 ¡𝑑𝑏𝑐𝑚𝑓𝑡 ¡𝑞𝑓𝑠 ¡𝑕𝑠𝑝𝑣𝑞 𝑕𝑠𝑝𝑣𝑞𝑡 − 1 – 𝑗𝑜𝑢𝑓𝑠 ¡𝑕𝑠𝑝𝑣𝑞 ¡𝑑𝑏𝑐𝑚𝑓𝑡 ¡ × (𝑕𝑠𝑝𝑣𝑞𝑡 − 1) ¡× ¡𝑕𝑠𝑝𝑣𝑞𝑡 ¡ ¡2 ⁄ 𝑗𝑜𝑢𝑓𝑠 ¡𝑕𝑠𝑝𝑣𝑞 ¡𝑑𝑏𝑐𝑚𝑓𝑡 ¡ ≤ ¡𝑕𝑚𝑝𝑐𝑏𝑚 ¡𝑑𝑏𝑐𝑚𝑓𝑡 ¡𝑞𝑓𝑠 ¡𝑕𝑠𝑝𝑣𝑞 𝑗𝑜𝑢𝑓𝑠 ¡𝑕𝑠𝑝𝑣𝑞 ¡𝑑𝑏𝑐𝑚𝑓𝑡 ¡ × (𝑕𝑠𝑝𝑣𝑞𝑡 − 1) ¡× ¡𝑕𝑠𝑝𝑣𝑞𝑡 ¡ ¡2 ⁄ 𝑕𝑠𝑝𝑣𝑞𝑡 − 1 Electrical group (2 cabinets) with 6 chassis Source: G. Fannes et al., SC’12 proceedings (2012) Cascade system with 8 electrical groups (16 cabinets) NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 11 𝑗𝑜𝑢𝑓𝑠 ¡𝑕𝑠𝑝𝑣𝑞 ¡𝑑𝑏𝑐𝑚𝑓𝑡 ¡ × (𝑕𝑠𝑝𝑣𝑞𝑡 − 1) ¡× ¡𝑕𝑠𝑝𝑣𝑞𝑡 ¡ ¡2 ⁄

  12. Regular multi-core vs. hybrid-multi-core blades Initial Multi-core blade Final hybrid CPU-GPU blade 48 port router 48 port router NIC0 NIC1 NIC2 NIC3 NIC0 NIC1 NIC2 NIC3 DDR3 DDR3 DDR3 DDR3 DDR3 DDR3 DDR3 DDR3 Xeon Xeon Xeon Xeon Xeon Xeon Xeon Xeon QPDC QPDC H-PDC H-PDC GDDR5 GDDR5 GDDR5 GDDR5 DDR3 DDR3 DDR3 DDR3 Xeon Xeon Xeon Xeon GK110 GK110 GK110 GK110 4 nodes configured with: 4 nodes configured with: > 2 Intel SandyBridge CPU > 1 Intel SandyBridge CPU > 32 GB DDR3-1600 RAM > 32 GB DDR3-1600 RAM > 1 NVIDIA K20X GPU � > 6GB GDDR5 memory � � � Peak performance of blade: 1.3 TFlops Peak performance of blade: 5.9 TFlops NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 12

  13. A brief history of “Piz Daint” •Installed 12 cabinets with dual-SandyBridge nodes in Oct./Nov. 2012 • ~50% of final system size to test network at scale (lessons learned from Gemini & XE6) • Rigorous evaluation of three node architectures based on 9 applications • dual-Xeon vs. Xeon/Xeon-Phi vs. Xeon/Kepler • joint study with Cray from Dec. 2011 through Nov. 2012 • Five applications were used for system design • CP2K, COSMO, SPECFEM-3D, GROMACS, Quantum ESPRESSO • CP2K & COSMO-OPCODE were co-developed with the system • Moving performance goals: hybrid Cray XC30 has to beat the regular XC30 by 1.5x • Upgrade to 28 cabinets with hybrid CPU-GPU nodes in Oct./Nov. 2013 • Accepted in Dec. 2013 • Early Science with fully performance hybrid nodes: January through March 2014 NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 13

  14. NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 14

  15. source: David Leutwyler NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 15

  16. Speedup of the full COSMO-2 production problem (apples to apples with 33h forecast of Meteo Swiss) Monte Rosa Tödi Piz Daint Piz Daint Cray XE6 Cray XK7 Cray XC30 Cray XC30 hybrid (GPU) (Nov. 2011) (Nov. 2012) (Nov. 2012) (Nov. 2013) 4x 4x 3x 3x 1.67x 3.36x 2x 2x 1.77x 1.49x 2.5x New HP2C funded code 1.4x 1.35x 1x 1x Current production code NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 16

  17. Energy to solution (kWh / ensemble member) Cray XE6 Cray XK7 Cray XC30 Cray XC30 hybrid (GPU) (Nov. 2011) (Nov. 2012) (Nov. 2012) (Nov. 2013) 6.0 Current production code 1.75x 1.41x 4.5 New HP2C funded code 6.89x 3.93x 3.0 1.49x 2.51x 2.64x 1.5 NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 17

  18. COSMO: current and new (HP2C developed) code main (current / Fortran) main (new / Fortran) dynamics (C++) boundary stencil library physics conditions & dynamics (Fortran) physics (Fortran) 
 X86 GPU halo exchg. (Fortran) with OpenMP / OpenACC Generic Shared Comm. Infrastructure Library MPI MPI or whatever system system NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 18

  19. velocities pressure Physical model temperature Mathematical description 
 water Discretization / algorithm 
 turbulence Domain science (incl. applied mathematics) lap(i,j,k) = –4.0 * data(i,j,k) + Code / implementation 
 data(i+1,j,k) + data(i-1,j,k) + data(i,j+1,k) + data(i,j-1,k); “Port” serial code to supercomputers Code compilation 
 > vectorize > parallelize Computer engineering > petascaling A given supercomputer 
 (& computer science) > exascaling > ... NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 19

  20. velocities pressure Physical model temperature Mathematical description 
 water Discretization / algorithm 
 turbulence Domain science (incl. applied mathematics) lap(i,j,k) = –4.0 * data(i,j,k) + Code / implementation 
 data(i+1,j,k) + data(i-1,j,k) + data(i,j+1,k) + data(i,j-1,k); “Port” serial code to supercomputers Code compilation 
 > vectorize > parallelize Computer engineering Architectural options / design 
 > petascaling A given supercomputer 
 (& computer science) > exascaling > ... NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 20

  21. velocities pressure Physical model temperature Mathematical description 
 water Discretization / algorithm 
 turbulence Tools & Domain science (incl. applied mathematics) Libraries lap(i,j,k) = –4.0 * data(i,j,k) + Code / implementation 
 data(i+1,j,k) + data(i-1,j,k) + data(i,j+1,k) + data(i,j-1,k); Optimal algorithm Code compilation 
 Auto tuning Computer engineering Architectural options / design 
 (& computer science) NorduGrid 2014, Helsinki, May 20, 2014 T. Schulthess 21

Recommend


More recommend