Landing GNU-Based Landing GNU-Based OpenMP OpenMP on CELL: on - PowerPoint PPT Presentation

Landing GNU-Based Landing GNU-Based OpenMP OpenMP on CELL: on CELL: Progress Report and Perspectives Progress Report and Perspectives Guang R. Gao Computer Architecture and Parallel System Laboratory Department of Electrical & Computer Engineering University of Delaware ggao@capsl.udel.edu 2007/6/19 Gao-CELL-06-2007 1

Outline Background Why GNU OpenMP on CELL ? Project Status Report Preliminary Results Future Perspectives 2007/6/19 Gao-CELL-06-2007 2

CAPSL Research Layout High End Computing Architecture & Programming Models Scientific Computation Kernels Other High End Applications Infrastructure & Base Execution Model Tools Analytical Modeling Fine Grained Multithreading High Performance System Tools (i.e. EARTH, Bio-computing CARE) Kernels Simulation/Emulation 2007/6/19 Gao-CELL-06-2007 3

Outline Background Why GNU OpenMP on CELL ? Project Status Report Preliminary Results Future Perspectives 2007/6/19 Gao-CELL-06-2007 4

CBE Architecture Overview Local storage size per SPU : 256KB Area: 221 mm² Technology Observed clock speed: a wide 90nm SOI range of operating frequencies are Total supported to optimize for power and number of yield; transistors : Peak performance (single 234M precision): > 256 GFlops Peak performance (double precision): >26 GFlops 2007/6/19 Gao-CELL-06-2007 5

State on Parallel Languages (based on a recent survey by G. Pfister, IBM) 200+ parallel language efforts in the past. At first glance: Most of them are not used!!! When talking about parallel languages, you usually hear MPI (90% of the time) and OpenMP (10%) Auto-parallelization has drifted from the general scene toward obscurity. 2007/6/19 Gao-CELL-06-2007 6

Why OpenMP? OpenMP is an industrial standard for writing parallel programs on shared memory architecture. OpenMP is available. OpenMP is being productively used. OpenMP is … 2007/6/19 Gao-CELL-06-2007 7

OpenMP Major Issues and Challenges For Compiler Writers, not Users Pragma / Directive Based Default is set to make it easy to write fast (but not necessarily correct) programs. OpenMP does not support sequential consistency Data layout and locality management Lack of support for OpenMP by the GCC compilers for the CBE. It only has 10% of parallel programming user community. ACK: this list comes from private communication with a number of people: William Gropp, John, Mellor-Crummy, Rick Stevens, Thomas Sterling, Ross Towle, Kathy Yelick , etc. 2007/6/19 Gao-CELL-06-2007 8

Issue #7 “I think its a waste of time to focus on trying to force these old broken poor parallel processing languages/protocols into the new approach.” However OpenMP is widely available today � This is evident from its inclusion in the GNU Compiler Collection in Release 4.2.0 2007/6/19 Gao-CELL-06-2007 9

GOMP Status See http://gcc.gnu.org/projects/gomp/ OpenMP support for C, C++, and Fortran 95 Will support 2.5 and 3.0 soon Released in May 5, 2007 as part of the official release of GCC 4.2 2007/6/19 Gao-CELL-06-2007 10

Outline Background Why GNU OpenMP on CELL ? Project Description Preliminary Results Future Perspectives 2007/6/19 Gao-CELL-06-2007 11

GNU Based OpenMP on CELL Objectives � A working OpenMP-CELL plarform � Has the following features � Single source compilation � Code partition and overlay � Software caching � A simple runtime system � Should finish in 1-yr , and pass a set of (non-toy) benchmarks and publish papers � Optimization is NOT an objective, but � Should propose a wish list of research topics for the next phase � Try to leverage knowledge/experience from the Cyclops-64 project 2007/6/19 Gao-CELL-06-2007 12

Single Source Compilation Progress Report • Partition creation by clustering • Partition creation by clustering • Addition of assembly directives • Addition of assembly directives SPU binary plus SPU binary plus • Insertion of library calls • Insertion of library calls partition manager partition manager • Outlining of parallel functions • Outlining of parallel functions and software and software cache libraries cache libraries Final Executable with all Final Executable with all the necessary (static) libraries the necessary (static) libraries Modified compiler, assembler and linker SPU-cc SPU exec Source Final exec Embedder Code PPU-cc PPU exec PPU binary plus PPU binary plus • Insertion of library calls • Insertion of library calls GOMP & GOMP & • Outlining of sequential code • Outlining of sequential code SPE libraries SPE libraries 2007/6/19 Gao-CELL-06-2007 13

14 The Code Overlay Problem Gao-CELL-06-2007 2007/6/19

Our Code Overlay Manager Features � Semi –static sub-division of buffer � Replacement policies and buffer behaviors � LRU vs. other replacement Policies � Lazy Reuse [cache-like] Buffer Behavior Modified Toolchain � User aided and automatic code partitioning � Command line options Remarks � compiler does no need to break object code into multiple files, and explicitly put the names of the files into a linker script, � simply link the partition manager library and use the default GNU linker script 2007/6/19 Gao-CELL-06-2007 15

Softw are Cache Why software caching ? Features: � Cache-Coherence enforced at synchronization points (e.g. barrier, lock, etc.) � Handle false-sharing at byte level Other cache design decisions � Cache parameters (32-bit address, block size: 128B, 128 blocks (16k) � Cache organization (set-associative, current: 4W) � Write back vs. write through � Replacement policy: LRU Remark: Only used as a backup solution 2007/6/19 Gao-CELL-06-2007 16

Softw are Cache An Overview 0-16 bytes 4 bytes 128 bytes Smooth the heterogeneity dirty bit vector tag & status data among different memory dirty bit vector tag & status data dirty bit vector tag & status data modules; dirty bit vector tag & status data dirty bit vector tag & status data The SPEs can simultaneously dirty bit vector tag & status data dirty bit vector tag & status data source/sink 8 bytes per dirty bit vector tag & status data processor cycles … (25.6+25.6GB/s at 3.2GHz) PPU 6 cycle load latency to 256KB SPU0 SPU1 SPU2 SPU3 SPU4 SPU5 SPU6 SPU7 local storage (LS) on SPE; $ LS LS LS LS LS LS LS LS Bytewise dirty bits but is adaptive; Element Interconnect Bus Cache line fill/flush are performed via DMA transfer; Main Mem 2007/6/19 Gao-CELL-06-2007 17

A Simple Runtime System Why a simple runtime system? Features of our simple runtime system � Shadow (PPU) threads and worker (SPU) threads Mainly used for testing the compiler and tool-chain 2007/6/19 Gao-CELL-06-2007 18

A Simple Runtime System An Overview e SPU Side d i S U P P Thr 0 serves as the Master Thread and creates all other threads POSIX Thread SPU Thread Communication 2007/6/19 Gao-CELL-06-2007 19

A Simple Runtime System Threads and Communication Initial Signal Command Buffer Command Buffer reply Command Buffer request Completion signal Incoming POSIX Thread signal SPU Thread Outgoing signal 2007/6/19 Gao-CELL-06-2007 20

Status Summary Code partition between SPU and PPU � Single source compilation � Outline parallel sections for SPU Explicit data movement between main memory and SPU � Software cache � Double buffering Code overlay to support large programs � Code partition support by the tool-chain � Object code format changes � Partition manager: decide when to load a new partition OpenMP runtime � PPU and SPU work together 2007/6/19 Gao-CELL-06-2007 21

Outline Background Why GNU OpenMP on CELL ? Project Description Preliminary Results Future Perspectives 2007/6/19 Gao-CELL-06-2007 22

Experimental Framew ork Tool-chain Modified components Extra libraries Software spu-ld v2.16.1 Software cache spu-as v2.16.1 Partition Manager spu-gcc v4.2.0 O.S. Yellow Dog Linux v5.0 *PS3 is a trademark of Sony corporation Hardware PS3 * Hardware 2007/6/19 Gao-CELL-06-2007 23

Benchmarks Benchmark Name Description huff, huff2 huffman decoding from MPEG2 idct, idct_2 IDCT and IQuantization from MPEG2 resize, reside_2 YUV file resizing algorithm alphablend A process of combining a translucent foreground color with a background (stream) file convert YUV2RGB - convert yuv file to raw stream file prgb2gm convert RBB file into BMP file gzip SPEC compression utility OpenMP Validation Suite OpenMP test cases from University of V1.0 Houston 2007/6/19 Gao-CELL-06-2007 24

Preliminary Experimental Results Pass preliminary tests for all benchmarks The automatic code overlay works � provides important performance gains for different applications � Modulus is better when the code / partitions have no re-use � LRU is better when the code / partitions have re-use � Degradation � 8 % in the worst case 2007/6/19 Gao-CELL-06-2007 25

Outline Background Project and Problem Formulation Status Report Results Related Work Future Perspectives 2007/6/19 Gao-CELL-06-2007 26

Landing GNU-Based Landing GNU-Based OpenMP OpenMP on CELL: on - PowerPoint PPT Presentation

Landing GNU-Based Landing GNU-Based OpenMP OpenMP on CELL: on CELL: Progress Report and Perspectives Progress Report and Perspectives Guang R. Gao Computer Architecture and Parallel System Laboratory Department of Electrical & Computer

Recommended Reading A Brief Introduction to OpenMP OpenMP FAQ http://openmp.org/openmp-faq.html

Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

OpenMP Paolo Burgio paolo.burgio@unimore.it A history of OpenMP 1997 OpenMP for

Threaded Programming Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

GNU Radio An introduction By Maryam Taghizadeh Dehkordi 9/9/2007 GNU Radio Outline

GNU epsilon an extensible programming language Luca Saiu <positron@gnu.org> GNU Hackers

GNU EMACS FOR ALL GNU EMACS FOR ALL SACHIN PATIL (PSACHIN) SACHIN PATIL (PSACHIN) GNU HACKER'S

Emacs Org-mode Bastien Guerry bzg@gnu.org August 27th, GNU Hackers meeting August 27th, GNU

LANDING ACCOUNT PROCEDURES. LANDING ACCOUNT The Landing Account is a report of all the cargo that

GLOBEVILLE LANDING OUTFALL Globeville Landing Park Globeville Landing Park Part of the DPR

Apollo 11: Lunar Landing INST 154 Apollo at 50 Lunar Landing Apollo 11 Landing Site Selection

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

OpenMP 4.0 and Beyond! Aidan Chalk, Hartree Centre, STFC What is OpenMP? OpenMP is an API

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

Delta-Complete Reachability Analysis Robustness Solving Correctness Sicun(Sean) Gao End

St Status us and pr and prospect pects f for sp spectr trosc oscop opy stu tudies s at

Propositional Logic: Tautological Consequence and Translations Alice Gao Lecture 6 CS 245

Compatible Recurrent Identities of the Sandpile Group and Maximal Stable Configurations Rupert Li

New asymmetric gravity-capillary and flexural waves Jean-Marc Vanden-Broeck University College

( xEy ) ( f ( x ) Ff ( y )) This just says f is an injection from X / E to Y / F . We can

Part Sizes of Smooth Supercritical Compositional Structures Part Sizes of Smooth Supercritical

Finite Incidence Geometry in GAP GAP in Algebraic Research Jan De Beule 21 November 2018

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Landing GNU-Based Landing GNU-Based OpenMP OpenMP on CELL: on - PowerPoint PPT Presentation

Landing GNU-Based Landing GNU-Based OpenMP OpenMP on CELL: on CELL: Progress Report and Perspectives Progress Report and Perspectives Guang R. Gao Computer Architecture and Parallel System Laboratory Department of Electrical & Computer

Recommended Reading A Brief Introduction to OpenMP OpenMP FAQ http://openmp.org/openmp-faq.html

Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

OpenMP Paolo Burgio paolo.burgio@unimore.it A history of OpenMP 1997 OpenMP for

Threaded Programming Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

GNU Radio An introduction By Maryam Taghizadeh Dehkordi 9/9/2007 GNU Radio Outline

GNU epsilon an extensible programming language Luca Saiu &lt;positron@gnu.org&gt; GNU Hackers

GNU EMACS FOR ALL GNU EMACS FOR ALL SACHIN PATIL (PSACHIN) SACHIN PATIL (PSACHIN) GNU HACKER'S

Emacs Org-mode Bastien Guerry bzg@gnu.org August 27th, GNU Hackers meeting August 27th, GNU

LANDING ACCOUNT PROCEDURES. LANDING ACCOUNT The Landing Account is a report of all the cargo that

GLOBEVILLE LANDING OUTFALL Globeville Landing Park Globeville Landing Park Part of the DPR

Apollo 11: Lunar Landing INST 154 Apollo at 50 Lunar Landing Apollo 11 Landing Site Selection

Bacteria Without a Cell Wall L-forms Pros &amp; Cons of Cell Wall Cell membrane Cell wall DNA

Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

OpenMP 4.0 and Beyond! Aidan Chalk, Hartree Centre, STFC What is OpenMP? OpenMP is an API

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

Delta-Complete Reachability Analysis Robustness Solving Correctness Sicun(Sean) Gao End

St Status us and pr and prospect pects f for sp spectr trosc oscop opy stu tudies s at

Propositional Logic: Tautological Consequence and Translations Alice Gao Lecture 6 CS 245

Compatible Recurrent Identities of the Sandpile Group and Maximal Stable Configurations Rupert Li

New asymmetric gravity-capillary and flexural waves Jean-Marc Vanden-Broeck University College

( xEy ) ( f ( x ) Ff ( y )) This just says f is an injection from X / E to Y / F . We can

Part Sizes of Smooth Supercritical Compositional Structures Part Sizes of Smooth Supercritical

Finite Incidence Geometry in GAP GAP in Algebraic Research Jan De Beule 21 November 2018

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

GNU epsilon an extensible programming language Luca Saiu <positron@gnu.org> GNU Hackers

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA