Application Characteristics and Performance on a Cray XE6 - PowerPoint PPT Presentation

Application Characteristics and Performance on a Cray XE6 Performance on a Cray XE6 Courtenay T. Vaughan Sandia National Laboratories Cray User Group May 2011 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

Cielo • Cray XE6 with 6654 compute nodes • dual-socket oct-core AMD Magny-Cours nodes • clocked at 2.4 GHz • 32 GB of 1.333 GHz DDR3 memory per node • 3D torus with Gemini interconnect • have large machine and smaller machines • were configured briefly as XT6 with same nodes and SeaStar interconnect nodes and SeaStar interconnect

XT5 • Cray XT5 with 160 compute nodes • dual socket with 6 core AMD Istanbul processors • 2.4 GHz processors • 32 GB of 800 MHz DDR2 Memory per node • 6 x 4 x 8 3D torus with SeaStar 2.2 6 4 8 3D tor s ith SeaStar 2 2

XE6 node Image courtesy of Cray, Inc.

CTH • Three-dimensional shock hydrodynamics code • Ran in flat mesh mode - no AMR (Automatic Mesh R fi Refinement) t) • Several points in each timestep where each processor sends a few large messages to up to processor sends a few large messages to up to six neighbors • Messages are aggregated from several variables per cell ll • Code is mostly FORTRAN with a little C

CTH Problems • explosively formed Shaped-Charge problem with 4 materials, high explosives, and 90 x 216 x 90 cells/processor in weak scaling mode cells/processor in weak scaling mode – Messages aggregate 40 variables per cell and average 5.2 MB • impact Meso-Scale problem with 11 materials and 80 x 80 x 275 cells/processor in weak scaling mode mode – Messages aggregate 75 variables per cell and average 10.4 MB

Shaped Charge Problem

CTH Communication matrices on 64 cores Meso-Scale Meso Scale Shaped-Charge Shaped Charge

CTH Communication traces from one timestep on 64 cores Shaped-Charge Meso-Scale

PRONTO • Structural mechanics code with contact algorithm • Communication for structural mechanics portion consists of boundary exchanges for single i t f b d h f i l variables from static decomposition • Contact algorithm based on dynamic secondary Contact algorithm based on dynamic secondary decomposition which changes during calculation and requires communication from and back to the primary decomposition primary decomposition • Code is FORTRAN 90 with C for contact communication

PRONTO Problems • Walls problem – Two sets of two brick walls colliding – Each processor has 320 bricks each of which have E h h 320 b i k h f hi h h 128 elements – All communication related to contact • Can Crush problem – Cylinder crushed by block – Communication both for finite element and contact algorithms – More balanced problem p

Walls Problem

Can Crush Problem

PRONTO Communication matrices on 64 cores Can Crush Can Crush Walls Walls

PRONTO Communication traces on 64 cores Walls Can Crush

CTH on XT5, XT6, and XE6 3000 2500 2000 Time 1500 sc XT5 XT5 sc XT5 -S4 1000 sc XT6 sc XE6 meso XT5 500 meso XT5 -S4 meso XT6 meso XE6 0 1 2 4 8 16 32 64 128 256 512 1024 Number of Cores

PRONTO on XT5, XT6, and XE6 2.5 walls XT5 walls XT5 walls XT5 -S4 walls XT6 -SN2 walls XT6 2.0 walls XE6 walls XE6 can XT5 can XT5 -S4 nds) can XE6 1.5 me (secon 1 0 1.0 Tim 0 5 0.5 0.0 16 32 64 128 256 Number of Cores

Average message traffic on 256 cores 70000 13e4 19e4 XT5 - CTH - shaped 60000 XT5 - CTH - meso XT5 - P3D - walls 5 3 a s XT5 - P3D - can crush 50000 XE6 - CTH - shaped nute XE6 - CTH - meso mber/min 40000 XE6 - P3D - walls XE6 - P3D - can crush 30000 Nu 20000 10000 0 < 16B 16B - 256B 256B - 4KB 4KB - 64KB 64KB - 1MB 1MB - 16MB total KB/sec Size

Summary of Results • Large portion of performance difference for both codes related to memory contention on XT5 when using 6 cores per NUMA region using 6 cores per NUMA region • CTH has large network bandwidth requirements and shows some performance improvement p p moving to the XE6 • PRONTO can send lots of small messages and shows more performance improvement moving to shows more performance improvement moving to the XE6

Future Work • Extend results to larger number of processors • Develop mini-app for CTH to see if we can take advantage of the message injection rate of the d t f th i j ti t f th Gemini interconnect

Application Characteristics and Performance on a Cray XE6 - PowerPoint PPT Presentation

Application Characteristics and Performance on a Cray XE6 Performance on a Cray XE6 Courtenay T. Vaughan Sandia National Laboratories Cray User Group May 2011 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed

Application Performance Tuning on Cray XT Systems Luiz DeRose John Levesque PE Director CSCE

Cray Lustre Model Roadmap Cory Spitz and Derek Robb Cray Inc. 5/24/2011 Introduction and Agenda

Energy Efficiency Metrics and Cray XE6 Application Performance Wilfried Oed Principal Engineer

FFT libraries on Cray XT: CRay Adaptive FFT (CRAFFT) Jonathan Bentz Cray Inc. Outline

The Cray 1 Time line 1969 -- CDC Introduces 7600, designed by cray. 1972 -- Design of the

Detecting Application Load Imbalance on Cray Systems Heidi Poxon Technical Lead, Performance

Benchmark Performance of Different Compilers on a Cray XE6 Mike Stewart and Helen He NERSC User

Performance of Density Functional Theory codes on Cray XE6 Zhengji Zhao, and Nicholas Wright

Howard Pritchard and Igor Gorodetsky Cray, Inc. Cray User Group Conference 2011 1 Cray User

Introducing the Cray XMT Petr Konecny November 29 th 2007 Agenda Shared memory programming

Bringing Up Cielo: Experiences with a Cray XE6 System Or, Getting Started with Your New 140k

I/O Performance on Cray XC30 Zhengji Zhao 1) , Doug Petesch 2) , David Knaak 2) , and Tina Declerck

Environment (CLE) Performance Jeff Larkin Jeff Kuehn Cray Inc. ORNL <larkin@cray.com>

GTC Overflow PARQUET Cray Inc. Confidential Slide 2 Cray has a long tradition of

Cray I/O Software Enhancements Tom Edwards tedwards@cray.com C O M P U T E | S T O R E

Dave Strenski, Cray Inc. Cray User Group, Atlanta 5-5-09 Storaasli - MRSC - 29 M 07 3 FPGA

May 9, 2017 8:30am ET 1 1Q Safe Harbor Statement Certain statements made within this

Windows Azure, Java and NoSQL Mario Szpuszta Jrgen Mayrburl T echnical Evangelist

PART I Galaxy Formation Models Darren Croton Centre for Astrophysics and Supercomputing

Category-specific video summarization Speaker: Danila Potapov Joint work with: Matthijs Douze

RF Power David Peterson, James Steimel DOE Independent Project Review of PIP-II 15 November 2016

ROBERT E. HOKE INDEPENDENT EVALUATION CONSULTANT GOALS FOR THIS SESSION Build on the lessons

Probability and Statistics for Computer Science Principal Component Analysis --- Exploring

EGI-EUDAT joint access to data and computing services: an executive report DI4R - Brussels

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Application Characteristics and Performance on a Cray XE6 - PowerPoint PPT Presentation

Application Characteristics and Performance on a Cray XE6 Performance on a Cray XE6 Courtenay T. Vaughan Sandia National Laboratories Cray User Group May 2011 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed

Application Performance Tuning on Cray XT Systems Luiz DeRose John Levesque PE Director CSCE

Cray Lustre Model Roadmap Cory Spitz and Derek Robb Cray Inc. 5/24/2011 Introduction and Agenda

Energy Efficiency Metrics and Cray XE6 Application Performance Wilfried Oed Principal Engineer

FFT libraries on Cray XT: CRay Adaptive FFT (CRAFFT) Jonathan Bentz Cray Inc. Outline

The Cray 1 Time line 1969 -- CDC Introduces 7600, designed by cray. 1972 -- Design of the

Detecting Application Load Imbalance on Cray Systems Heidi Poxon Technical Lead, Performance

Benchmark Performance of Different Compilers on a Cray XE6 Mike Stewart and Helen He NERSC User

Performance of Density Functional Theory codes on Cray XE6 Zhengji Zhao, and Nicholas Wright

Howard Pritchard and Igor Gorodetsky Cray, Inc. Cray User Group Conference 2011 1 Cray User

Introducing the Cray XMT Petr Konecny November 29 th 2007 Agenda Shared memory programming

Bringing Up Cielo: Experiences with a Cray XE6 System Or, Getting Started with Your New 140k

I/O Performance on Cray XC30 Zhengji Zhao 1) , Doug Petesch 2) , David Knaak 2) , and Tina Declerck

Environment (CLE) Performance Jeff Larkin Jeff Kuehn Cray Inc. ORNL &lt;larkin@cray.com&gt;

GTC Overflow PARQUET Cray Inc. Confidential Slide 2 Cray has a long tradition of

Cray I/O Software Enhancements Tom Edwards tedwards@cray.com C O M P U T E | S T O R E

Dave Strenski, Cray Inc. Cray User Group, Atlanta 5-5-09 Storaasli - MRSC - 29 M 07 3 FPGA

May 9, 2017 8:30am ET 1 1Q Safe Harbor Statement Certain statements made within this

Windows Azure, Java and NoSQL Mario Szpuszta Jrgen Mayrburl T echnical Evangelist

PART I Galaxy Formation Models Darren Croton Centre for Astrophysics and Supercomputing

Category-specific video summarization Speaker: Danila Potapov Joint work with: Matthijs Douze

RF Power David Peterson, James Steimel DOE Independent Project Review of PIP-II 15 November 2016

ROBERT E. HOKE INDEPENDENT EVALUATION CONSULTANT GOALS FOR THIS SESSION Build on the lessons

Probability and Statistics for Computer Science Principal Component Analysis --- Exploring

EGI-EUDAT joint access to data and computing services: an executive report DI4R - Brussels

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Environment (CLE) Performance Jeff Larkin Jeff Kuehn Cray Inc. ORNL <larkin@cray.com>