Global HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and - PowerPoint PPT Presentation

Global HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain, Steve Deitz, Mary Beth Hribar, Wayne Wong Chapel Team, Cray Inc.

Overview � Chapel: Cray’s HPCS language � Our approach to the HPC Challenge codes: • performance-minded • clear, intuitive, readable • general across… � types � problem parameters � modular boundaries HPCC BOF, SC06

Code Size Summary 1800 1668 Reference Version 1600 Framework 1406 Computation 1400 1200 Chapel Version Prob. Size (common) SLOC 1000 Results and output 800 Verification Initialization 600 Kernel declarations 433 Kernel computation 400 156 200 124 86 0 Reference Chapel Reference Chapel Reference Chapel STREAM Random FFT Triad Access HPCC BOF, SC06

Chapel Code Size Summary 180 156 Problem Size 160 (common) Results and output 140 124 Verification 120 Initialization 100 SLOC 86 Kernel declarations 80 Kernel computation 60 40 20 0 STREAM Triad Random Access FFT HPCC BOF, SC06

Chapel Code Size Summary 1400 1299 Problem Size (common) 1200 Results and output Static Lexical Tokens 1000 Verification 863 800 Initialization 593 Kernel declarations 600 Kernel computation 400 200 0 STREAM Triad Random Access FFT HPCC BOF, SC06

STREAM Triad Overview const ProblemSpace: domain (1) distributed (Block) = [1..m]; var A, B, C: [ProblemSpace] elemType; A = B + alpha * C; HPCC BOF, SC06

STREAM Triad Overview Declare a 1D arithmetic domain (first-class index set) const ProblemSpace: domain (1) distributed (Block) = [1..m]; var A, B, C: [ProblemSpace] elemType; Specify its distribution Use domain to declare A = B + alpha * C; distributed arrays Express computation using promoted scalar operators and whole-array references ⇒ parallel computation L0 L1 L2 L3 L4 ProblemSpace A = = = = = B + + + + + C * * * * * alpha HPCC BOF, SC06

Random Access Overview [i in TableSpace] T(i) = i; forall block in subBlocks(updateSpace) do for r in RAStream(block.numIndices, block.low) do T(r & indexMask) ^= r; HPCC BOF, SC06

Random Access Overview Initialize table using a forall expression [i in TableSpace] T(i) = i; forall block in subBlocks(updateSpace) do for r in RAStream(block.numIndices, block.low) do T(r & indexMask) ^= r; Random stream expressed Express table updates using modularly using an iterator forall- and for-loops iterator RAStream(numvals, start:randType = 0): randType { var val = getNthRandom(start); for i in 1..numvals { getNextRandom(val); yield val; } } HPCC BOF, SC06

FFT Overview (radix 4) for i in [2..log2(numElements)) by 2 { const m = span*radix, m2 = 2*m; forall (k,k1) in (Adom by m2, 0..) { var wk2 = …, wk1 = …, wk3 = …; forall j in [k..k+span) do butterfly(wk1, wk2, wk3, A[j..j+3*span by span]); wk1 = …; wk3 = …; wk2 *= 1.0i; forall j in [k+m..k+m+span) do butterfly(wk1, wk2, wk3, A[j..j+3*span by span]); } span *= radix; } def butterfly(wk1, wk2, wk3, inout A:[1..radix]) { … } HPCC BOF, SC06

FFT Overview (radix 4) for i in [2..log2(numElements)) by 2 { const m = span*radix, m2 = 2*m; forall (k,k1) in (Adom by m2, 0..) { var wk2 = …, wk1 = …, wk3 = …; Parallelism expressed using nested forall-loops forall j in [k..k+span) do butterfly(wk1, wk2, wk3, A[j..j+3*span by span]); Support for complex and imaginary wk1 = …; wk3 = …; wk2 *= 1.0i; math simplifies FFT arithmetic forall j in [k+m..k+m+span) do butterfly(wk1, wk2, wk3, A[j..j+3*span by span]); } span *= radix; Generic arguments allow routine to be called with } complex, real, or imaginary twiddle factors def butterfly(wk1, wk2, wk3, inout A:[1..radix]) { … } HPCC BOF, SC06

Chapel Compiler Status � All codes compile and run with our current Chapel compiler • focus to date has been on… � prototyping Chapel, not performance � targeting a single locale • platforms: Linux, Cygwin (Windows), Mac OS X, SunOS, … � No meaningful performance results yet • written report contains performance discussions for our codes � Upcoming milestones • December 2006: limited release to HPLS team • 2007: work on distributed-memory execution and optimizations • SC07: intend to have publishable performance results for HPCC`07 HPCC BOF, SC06

Summary � Have expressed HPCC codes attractively • clear, concise, general • express parallelism, compile and execute correctly on one locale • benefit from Chapel’s global-view parallelism • utilize generic programming and modern SW Engineering principles � Our written report contains: • complete source listings • detailed walkthroughs of our solutions as Chapel tutorial • performance notes for our implementations � Report and presentation available at our website: http://chapel.cs.washington.edu � We’re interested in your feedback: chapel_info@cray.com HPCC BOF, SC06

Backup Slides

Compact High-Level Code… EP CG FT 80 300 800 communication communication 3 declarations communication 70 700 declarations computation declarations 250 computation 135 computation 17 60 600 89 200 50 500 25 Lines of Code Lines of Code Lines of Code 332 40 150 400 79 30 300 54 100 128 20 200 36 37 50 249 10 82 100 204 38 0 0 0 F+MPI ZPL F+MPI ZPL F+MPI ZPL Language Language Language MG IS 1200 300 communication communication declarations declarations 1000 250 computation computation 22 300 communication 250 22 200 800 200 Lines of Code 72 72 50 1 80 Lines of Code Lines of Code 566 1 00 declarations 152 111 50 80 600 150 0 C+ MPI ZP L Language computation 400 100 202 152 111 200 50 242 87 70 0 0 F+MPI ZPL C+MPI ZPL Language Language HPCC BOF, SC06

…need not perform poorly EP CG FT MG IS C/Fortran + MPI ZPL versions See also Rice University’s recent D-HPF work… HPCC BOF, SC06

Global HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and - PowerPoint PPT Presentation

Global HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain, Steve Deitz, Mary Beth Hribar, Wayne Wong Chapel Team, Cray Inc. Overview Chapel:

Chapel: Global HPCC Benchmarks and Status Update Brad Chamberlain Chapel Team CUG 2007 May 7,

Global HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06

CHAPEL + LAPACK Ian Bertolacci NEW DOG, MEET OLD DOG. INTRO: WHAT IS CHAPEL Chapel is a

Benchmarks Online Testing Data District Benchmarks English/Language Arts and Math

The HPC Challenge Benchmarks and the PMaC project Certificates of relevance for benchmarks

Chapel: Status/Community Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010 Outline Chapel

William Dalmer 20 Psalm & Hymn Tunes Trim Street Chapel, Bath. Completed 1796. Northgate

BENCHMARKS TOPIC SUMMARY Scott Adams, Dilbert BENCHMARKS The Investment Process and how BM fits

Inside The RT Patch Talk: Steven Rostedt (Red Hat) Benchmarks : Darren V Hart (IBM) Inside

High Performance Computing Cluster 1 1 HPCC Hardware 20 Compute nodes, 560 cores with room to

The HPC Challenge Benchmark The HPC Challenge Benchmark http://icl.cs.utk.edu/hpcc/ Jack

Optimizing FFT for HPCC Mark P. Sears and Courtenay T. Vaughan Sandia National Laboratories Cray

Outline Introduction PGAS Chapel Motivation Related Studies Benchmarks

LAUNCH CHAPEL HILL & 1789 CHAPEL HILLS GROWING ENTREPRENEURIAL ECOSYSTEM KFBS &

GLOBAL RISKS GLOBAL RISKS GLOBAL RISKS - GLOBAL RISKS - - - GLOBAL RISKS GLOBAL RISKS

Fits! Jason Richards, MD Resident, UNC Dept. of Neurology One year ago... 11/12/2015 2

Advanced Digital Signal Processing Part 4: DFT and FFT Gerhard Schmidt

Hardwa rdware re-acc acceler elerated ated CC CCD re D reado adout ut sm smear ar co

Divide and Conquer: The transform named in his honor is a mathematical technique that can be

Divide and Conquer: It takes advantage of the somewhat mysterious Euler equation that

Autotuning (2.5/2): TCE & Empirical compilers Prof. Richard Vuduc Georgia Institute of

Fresh Breeze A Radical Approach to Massively Parallel Architecture and Programming Jack Dennis

Efficient Ring-LWE Encryption on 8-bit AVR Processors . Zhe Liu 1 Hwajeong Seo 2 Sujoy Sinha Roy

Recursive neural networks for semantic interpretation Sam Bowman Department of Linguistics and

Sambuz

Useful Links

Newsletter

Mail Us

Global HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and - PowerPoint PPT Presentation

Global HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06 Class 2 Submission November 14, 2006 Brad Chamberlain, Steve Deitz, Mary Beth Hribar, Wayne Wong Chapel Team, Cray Inc. Overview Chapel:

Chapel: Global HPCC Benchmarks and Status Update Brad Chamberlain Chapel Team CUG 2007 May 7,

Global HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT HPC Challenge BOF, SC06

CHAPEL + LAPACK Ian Bertolacci NEW DOG, MEET OLD DOG. INTRO: WHAT IS CHAPEL Chapel is a

Benchmarks Online Testing Data District Benchmarks English/Language Arts and Math

The HPC Challenge Benchmarks and the PMaC project Certificates of relevance for benchmarks

Chapel: Status/Community Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010 Outline Chapel

William Dalmer 20 Psalm &amp; Hymn Tunes Trim Street Chapel, Bath. Completed 1796. Northgate

BENCHMARKS TOPIC SUMMARY Scott Adams, Dilbert BENCHMARKS The Investment Process and how BM fits

Inside The RT Patch Talk: Steven Rostedt (Red Hat) Benchmarks : Darren V Hart (IBM) Inside

High Performance Computing Cluster 1 1 HPCC Hardware 20 Compute nodes, 560 cores with room to

The HPC Challenge Benchmark The HPC Challenge Benchmark http://icl.cs.utk.edu/hpcc/ Jack

Optimizing FFT for HPCC Mark P. Sears and Courtenay T. Vaughan Sandia National Laboratories Cray

Outline Introduction PGAS Chapel Motivation Related Studies Benchmarks

LAUNCH CHAPEL HILL &amp; 1789 CHAPEL HILLS GROWING ENTREPRENEURIAL ECOSYSTEM KFBS &amp;

GLOBAL RISKS GLOBAL RISKS GLOBAL RISKS - GLOBAL RISKS - - - GLOBAL RISKS GLOBAL RISKS

Fits! Jason Richards, MD Resident, UNC Dept. of Neurology One year ago... 11/12/2015 2

Advanced Digital Signal Processing Part 4: DFT and FFT Gerhard Schmidt

Hardwa rdware re-acc acceler elerated ated CC CCD re D reado adout ut sm smear ar co

Divide and Conquer: The transform named in his honor is a mathematical technique that can be

Divide and Conquer: It takes advantage of the somewhat mysterious Euler equation that

Autotuning (2.5/2): TCE &amp; Empirical compilers Prof. Richard Vuduc Georgia Institute of

Fresh Breeze A Radical Approach to Massively Parallel Architecture and Programming Jack Dennis

Efficient Ring-LWE Encryption on 8-bit AVR Processors . Zhe Liu 1 Hwajeong Seo 2 Sujoy Sinha Roy

Recursive neural networks for semantic interpretation Sam Bowman Department of Linguistics and

Sambuz

Useful Links

Newsletter

Mail Us

William Dalmer 20 Psalm & Hymn Tunes Trim Street Chapel, Bath. Completed 1796. Northgate

LAUNCH CHAPEL HILL & 1789 CHAPEL HILLS GROWING ENTREPRENEURIAL ECOSYSTEM KFBS &

Autotuning (2.5/2): TCE & Empirical compilers Prof. Richard Vuduc Georgia Institute of