Lawrence Livermore National Laboratory Sequoia and the Petascale Era SCICOMP 15 May 20, 2009 Thomas Spelce Development Environment Group Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 LLNL-PRES-411030 The Advanced Simulation and Computing (ASC) Program delivers high confidence prediction of weapons behavior Integrated Codes Codes to predict safety and reliability Verification and Physics and Validation Engineering Models Experiments provide Models and critical validation data NNSA Science Campaigns understanding Experiments Legacy UGTs ASC integrates all of the science and engineering that makes stewardship successful Lawrence Livermore National Laboratory 2 LLNL-PRES-411030 10th International LCI Conference
ASC pursued three classes of systems to cost effectively meet current (and anticipate future) compute requirements Higher performance and Capability systems ==> the most lower power consumption challenging integrated design calculations Sequoia More costly but proven • Roadrunner Production workload • BlueGene/L Performance Original concept: Capacity systems ==> day to day work develop capability Purple TLCC (Juno) Less costly, somewhat less reliable • Peloton Throughput for less demanding • Q problems Thunder White MCR Mainframes (RIP) Advanced Architectures ==> Blue performance, power consumption, etc. Low-cost capacity Targeted but demanding workload • Red Tomorrow’s mainstream solutions? FY01 FY05 • Time The “three curves” (Capability, Capacity and Advanced Architectures) approach has been successful in delivering good cost performance across the spectrum of need… Lawrence Livermore National Laboratory LLNL-PRES-411030 10th International LCI Conference 3 Sequoia represents largest increase in computational power ever delivered for NNSA Stockpile Stewardship Sequoia Five Years Planned Lifetime Through CY17 Market Survey 1/06 7/06 12/06 Write RFP CD0 Approved 1/07 7/07 12/07 Vendor Response Contract Package CD1 Approved Selection Sequoia Plan Review 1/08 7/08 12/08 Dawn LA CD2/3 Approved Dawn Early Science Transition to Classified Dawn GA 1/09 7/09 12/09 Dawn Phase 1 Dawn Phase 2 Sequoia Build Decision 1/10 7/10 12/10 Phased System Deliveries Sequoia Parts Build Sequoia Parts Commit & Option Sequoia Demo 1/11 7/11 12/11 Transition to Classified CD4 Approved Sequoia Operational Readiness Sequoia Early Science 1/12 7/12 12/12 Sequoia contract award Dawn system acceptance Sequoia final system acceptance Lawrence Livermore National Laboratory 4 LLNL-PRES-411030 10th International LCI Conference
“Dawn speeds a man on his journey, and speeds him too in his work” ...Hesiod (~700 B.C.E) Dawn Specifications • IBM BG/P architecture • 36,864 compute nodes (500TF) • 147,456 PPC 450 cores • 4GB memory per node (147.5TB) • 128-to-1 compute to I/O node ratio • 288 10GE links to file system Dawn Installation • Feb 27 th - final rack delivery • March 5 th - 36 Rack integration complete • March 15-24 th – Synthetic WorkLoad start • End of March - Acceptance (planned) ibm.com/systems/deepcomputing/bluegene/ Lawrence Livermore National Laboratory LLNL-PRES-411030 10th International LCI Conference 5 DAWN SEQUOIA Initial Delivery System 36 racks 0.5 PF/s 144 TB Rack 1.3 MW 14 TF/s >8 Day MTBF 4 TB 36 KW Chip 850 MHz PPC 450 Node Card 4 cores/4 threads 435 GF/s 13.6 GF/s Peak 128 GB 8 MB EDRAM Compute Card 13.6 GF/s 4.0 GB DDR2 13.6 GB/s Memory BW 0.75 GB/s 3D Torus BW Lawrence Livermore National Laboratory 6 LLNL-PRES-411030 10th International LCI Conference
DAWN Initial Delivery Infrastructure 288 – 10GbE 8 x 4 – LLNL 10GbE 1 – 10GbE HTC 2 – 1GbE 4 x 4 – 144 – 1GbE 1GbE E-net Dawn Core (9 x 4 BG/P Racks) 14 – 1GbE Core 14 – 1GbE 2 – 1GbE 3 – 1GbE Primary Backup 2 – 1GbE LOGIN SERVICE SERVICE SERVICE HMC 2 – FC4 2 – FC4 2 – FC4 2 – FC4 2 – 10GbE 2 – 10GbE 12 – 2 – 10GbE Local Disk 10GbE Lawrence Livermore National Laboratory LLNL-PRES-411030 10th International LCI Conference 7 Sequoia Target Architecture and Infrastructure Production Operation FY12-FY17 20PF/s, 1.6 PB Memory • 96 racks, 98,304 nodes • 1.6 M cores (1 GB/core) • 50 PB Lustre file system • 6.0 MW power (160 times • more efficient than Purple) Will be used as a 2D ultra-res and 3D high-res Uncertainty Quantification (UQ) engine Will be used for 3D science capability runs exploring key materials science problems Lawrence Livermore National Laboratory 8 LLNL-PRES-411030 10th International LCI Conference
High performance material science simulations will contribute directly to ASC programmatic success Six physics/materials science applications targeted for early implementation on Sequoia infrastructure • Qbox – Quantum molecular dynamics for determination of material equation of state • DDCMD – Molecular dynamics for material dynamics • Miranda – 3D Continuum fluid dynamics for interfacial mixing • ALE3D – 3D Continuum mechanics for ignition and detonation propagation of explosives • LAMPPS – Molecular dynamics for shock initiation in high explosives • ParaDiS – Dislocation dynamics for high pressure strength in materials Lawrence Livermore National Laboratory LLNL-PRES-411030 10th International LCI Conference 9 Single Sequoia Platform Mandatory Requirement is P ≥ 20 P is “peak” of the machine measured in petaFLOP/s Target requirement is P + S ≥ 40 • S is weighted average of five “marquee” benchmark codes • Four code package benchmarks − UMT, IRS, AMG, and SPhot − Program goal is 24x the Purple capability throughput • One “science workload” benchmark from SNL − LAMMPS (molecular dynamics) − Program goal is 20x-50x BGL for science capability BlueGene /L – 367TF/s BlueGene /L – 367TF/s Purple - 100TF/s Purple - 100TF/s Lawrence Livermore National Laboratory 10 LLNL-PRES-411030 10th International LCI Conference
Sequoia Operating System Perspective 1-N CN… Light weight kernel on compute nodes Application Optimized for scalability and reliability Application NPTL Posix threads As simple as possible Application glibc dynamic loading NPTL Posix threads Application Extremely low OS noise glibc dynamic loading MPI NPTL Posix threads GLIBC glibc dynamic loading MPI ADI Posix threads, OpenMP, SE/TM Direct access to interconnect hardware GLIBC Shared glibc dynamic loading MPI ADI syscalls Futex RAS GLIBC Shared OS features MPI Memory syscalls ADI Futex RAS GLIBC hardware transport Shared Memory ADI syscalls Futex RAS Sequoia CN and Interconnect hardware transport Linux/Unix syscall compatible w/ I/O syscalls Function Shipped Memory Sequoia CN and Interconnect hardware transport SMP RAS syscalls Support for dynamic lib runtime loading Sequoia CN and Interconnect hardware transport Sequoia CN and Interconnect Shared memory regions Compute Nodes Open source Linux/Unix OS on I/O nodes FSD SLURMD Perf tools totalview Leverage large Linux/Unix base & community Enhance TCP offload, PCIe, I/O Function Shipped Lustre Client NFSv4 syscalls Standard File Systems - Lustre, NFSv4, etc. LNet UDP TCP/IP Aggregates N CN for I/O & admin Linux/Unix Open source Sequoia ION and Interconnect Lawrence Livermore National Laboratory LLNL-PRES-411030 10th International LCI Conference 11 Sequoia Software Stack – Applications Perspective Code Development Tools C/C++/Fortran Compilers, Python User Space Kernel Space Function Shipped Optimized Math Libs APPLICATION Code Dev Tools Infrastructure syscalls LWK, Linux/Unix Parallel Math Libs RAS, Control System OpenMP, Threads, SE/TM SLURM/Moab Clib/F03 runtime SOCKETS Lustre Client TCP UDP LNet MPI2 IP ADI Interconnect Interface External Network Lawrence Livermore National Laboratory 12 LLNL-PRES-411030 10th International LCI Conference
The tools that users know and love will be available on Sequoia with improvements and additions as needed Existing STAT TV memlight MRNet 10 7 - LaunchMON New Operational Scale memP New Lightweight PMPI 10 6 - mpiP Focus Tools O|SS OTF TAU 10 5 - APAI OpenMP SE/TM DPCL TotalView Analyzer Analyzer 10 4 - MemCheck Valgrind OpenMP Profiling SE/TM Interface Monitor ThreadCheck gprof Stack SE/TM Dyninst Walker Debugger PAPI 1 - Features Debugging Performance Infrastructure Lawrence Livermore National Laboratory LLNL-PRES-411030 10th International LCI Conference 13 Application programming requirements and challenges Availability of 1.6M cores pushes all- I/O & MPI codes to extreme concurrency I/O & Visualization Visualization Availability of many threads on many SMP cores encourages low-level Hybrid Hybrid parallelism for higher performance Models Models Mixed MPI/SMP programming environment and possibility of SMP SMP heterogeneous compute distribution Threads Threads brings load imbalance to the fore I/O and visualization requirements encourage innovative strategies to MPI MPI minimize memory and bandwidth Scaling Scaling bottlenecks Lawrence Livermore National Laboratory 14 LLNL-PRES-411030 10th International LCI Conference
Recommend
More recommend