Oak Ridge National Laboratory Buddy Bland Project Director Oak - PowerPoint PPT Presentation

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 Office of Science

ORNL’s “Titan” Hybrid System: Cray XK7 with AMD Opteron and NVIDIA Tesla processors SYSTEM SPECIFICATIONS: • Peak performance of 27.1 PF (24.5 & 2.6) • 18,688 Compute Nodes each with: • 16-Core AMD Opteron CPU (32 GB) • NVIDIA Tesla “K20x” GPU (6 GB) • 512 Service and I/O nodes • 200 Cabinets 4,352 ft 2 • 710 TB total system memory 404 m 2 • Cray Gemini 3D Torus Interconnect • 8.9 MW peak power – 8.3 avg. 4 Buddy Bland – SC’12

X86 processor provides fast, single thread performance for control & communications AMD Opteron 6274 • 16 cores • 141 GFLOPs peak 5 Buddy Bland – SC’12

GPUs are designed for extreme parallelism, performance & power efficiency NVIDIA Tesla K20x • 14 Streaming Multiprocessors • 2,688 CUDA cores • 1.31 TFLOPs peak (DP) • 6 GB GDDR5 memory • HPL: >2.0 GFLOPs per Watt (Titan full system measured power) 6 Buddy Bland – SC’12

Cray XK7 Compute Node XK7 Compute Node Characteristics AMD Opteron 6274 16 core processor @ 141 GF Tesla K20x @ 1311 GF Host Memory 32GB 1600 MHz DDR3 Tesla K20x Memory Z 6GB GDDR5 Y Gemini High Speed Interconnect X Slide courtesy of Cray, Inc. 7 Buddy Bland – SC’12

Titan: Cray XK7 System System: 200 Cabinets 18,688 Nodes 27 PF 710 TB Cabinet: 24 Boards 96 Nodes Board: 139 TF 4 Compute Nodes 3.6 TB 5.8 TF Compute Node: 152 GB 1.45 TF 38 GB 8 Buddy Bland – SC’12

Why GPUs? High Performance and Power Efficiency on a Path to Exascale • Hierarchical parallelism – Improves scalability of applications • Exposing more parallelism through code refactoring and source code directives • Heterogeneous multi-core processor architecture – Use the right type of processor for each task. • Data locality – Keep the data near the processing. GPU has high bandwidth to local memory for rapid access. GPU has large internal cache • Explicit data management – Explicitly manage data movement between CPU and GPU memories. 13 Buddy Bland – SC’12

Hybrid Programming Model • On Jaguar, with 299,008 cores, we were seeing the limits of a single level of MPI scaling for most applications • To take advantage of the vastly larger parallelism in Titan, users need to use hierarchical parallelism in their codes – Distributed memory: MPI, SHMEM, PGAS – Node Local: OpenMP, Pthreads, local MPI communicators – Within threads: Vector constructs on GPU, libraries, OpenACC • These are the same types of constructs needed on all all multi-PFLOPS computers to scale to the full size of the systems! 14 Buddy Bland – SC’12

How do you program these nodes? • Compilers – OpenACC is a set of compiler directives that allows the user to express hierarchical parallelism in the source code so that the compiler can generate parallel code for the target platform, be it GPU, MIC, or vector SIMD on CPU – Cray compiler supports XK7 nodes and is OpenACC compatible – CAPS HMPP compiler supports C, C++ and Fortran compilation for heterogeneous nodes with OpenACC support – PGI compiler supports OpenACC and CUDA Fortran • Tools – Allinea DDT debugger scales to full system size and with ORNL support will be able to debug heterogeneous (x86/GPU) apps – ORNL has worked with the Vampir team at TUD to add support for profiling codes on heterogeneous nodes – CrayPAT and Cray Apprentice support XK6 programming 15 Buddy Bland – SC’12

Early Science Applications on Titan Material Science Climate Change (CAM-SE) Biofuels (LAMMPS) (WL-LSMS) Answer questions about specific climate A multiple capability change adaptation and mitigation molecular dynamics Role of material disorder, scenarios; realistically represent code. statistics, and fluctuations features like precipitation in nanoscale materials patterns/statistics and tropical storms. and systems. Astrophysics (NRDF) Combustion (S3D) Nuclear Energy (Denovo) Radiation transport – critical to Combustion simulations to Unprecedented high-fidelity astrophysics, laser fusion, enable the next generation radiation transport calculations combustion, atmospheric of diesel/biofuels to burn that can be used in a variety of dynamics, and medical imaging. more efficiently. nuclear energy and technology applications. 21 Buddy Bland – SC’12

How Effective are GPUs on Scalable Applications? OLCF-3 Early Science Codes Very early performance measurements on Titan Cray XK7: K20x GPU plus AMD 6274 CPU XK7 (w/ K20x) vs. XE6 Cray XE6: Dual AMD 6274 and no GPU Cray XK6 w/o GPU: Single AMD 6274, no GPU Performance Application Comments Ratio • Turbulent combustion S3D 1.8 • 6% of Jaguar workload • Sweep kernel of 3D neutron transport for nuclear reactors Denovo sweep 3.8 • 2% of Jaguar workload 7.4* • High-performance molecular dynamics LAMMPS • 1% of Jaguar workload (mixed precision) • Statistical mechanics of magnetic materials • 2% of Jaguar workload WL-LSMS 3.8 • 2009 Gordon Bell Winner • Community atmosphere model 1.8* CAM-SE • 1% of Jaguar workload (estimate) 22 Buddy Bland – SC’12

Want to join our team? Questions? ORNL is hiring. Contact us at http://jobs.ornl.gov BlandAS@ornl.gov The research and activities described in this presentation were performed using the resources of the National Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC0500OR22725. 27 27 Buddy Bland – SC’12

Oak Ridge National Laboratory Buddy Bland Project Director Oak - PowerPoint PPT Presentation

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 Office of Science ORNLs Titan Hybrid System: Cray XK7 with AMD

Uncertainty Quantification in Materials Modeling Pablo Seleson Oak Ridge National Laboratory

PRESENTATION Paul Gailey Oak Ridge National Laboratory Oak Ridge, TN The presenter for this

Oak Ridge and Neutrinos eHarmony forms another perfect couple H. Ray University of Florida

UCHIWA / Design by Doshi Levien HAY UCHIWA HAY UCHIWA Oak Black Stained Oak Black Stained

Oak Ridge National Laboratory: Research and Development Capabilities Presented at Mo-99

Innovation Crossroads is a public-private partnership founded by Oak Ridge National Laboratory and

Innovation Crossroads is a public-private partnership founded by Oak Ridge National Laboratory and

OSCAR Meta-Package System (v4.x) by: John Mugler Oak Ridge National Laboratory -- U.S.

Following Through: Oak Ridges Cleanup Program Continues Forward Laura Wilkerson, Deputy

CONFLICT MANAGEMENT Advanced Management, Inc. 1936 Oak Ridge Turnpike, Oak Ridge, TN 37830

ORNLs Frontier Exascale Computer Al Geist Oak Ridge National Laboratory Smoky Mountain

An Overview of Oak Ridge National Laboratory and Early Career Highlights Adams Alumni

Brian Sales Oak Ridge National Laboratory Research Sponsored by DOE BES Division of Materials

MDF and Oak Kitchen 1 Constructed from MDF with solid oak worktops MDF and Oak Kitchen 2 Kitchen

WHIT ITE OAK SCIE IENCE GATEWAY 1 White Oak Science Gateway Master Plan 2014 to 2018

Molten Salt Reactor Development 2017 Molten Salt Reactor Workshop Oak Ridge Tennessee Lou

Discussion of Vector-based Computers and Applicability of Different Types of Programs Weston

GESTURE RECOGNITION: USING A MULTI SENSOR APPROACH SHALINI GUPTA, PAVLO MOLCHANOV, KIHWAN KIM,

AnInputandGesture Recogni3onFrameworkfor TacTile ChadThompson

RIO: A Pervasive RFID-based Touch Gesture Interface Swadhin Pradhan 1 , Eugene Chai 2 , Karthik

Deep Learning on GPU Mattias Flt Dept. of Automatic Control Lund Institute of Technology

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

SUPERCOMPUTERS TO SUPERCARS Bill Veenhuis Sr. Solutions Architect, Automotive

Tuning Basic Linear Algebra Routines for Hybrid CPU+GPU Platforms e , Luis P. Garc a, Javier

Sambuz

Useful Links

Newsletter

Mail Us