P2S2-2010 Panel Is Hybrid Programming a Bad Idea Whose Time Has - PowerPoint PPT Presentation

P2S2-2010 Panel Is Hybrid Programming a Bad Idea Whose Time Has Come ? Taisuke Boku Center for Computational Sciences University of Tsukuba 2010/09/13 P2S2-2010 Panel 1

Definition  Term of “Hybrid Programming” sometime means “Hybrid Memory Programming” such as a combination of shared- memory and distributed-memory: ex) MPI + OpenMP  Term of “Heterogeneous Programming” sometime means “Hybrid Programming over Heterogeneous CPU Architecture” such as a combination of general purpose CPU and special purpose accelerator: ex) C + CUDA  In this panel, “Hybrid Programming” includes both meaning 2010/09/13 P2S2-2010 Panel 2

Has the time of Hybrid Programming come ?  Today’s most typical hybrid architecture is “multi-core general CPU + (multiple) GPU”, and on this architecture, we are doing hybrid programming such as C + CUDA, everyday  Up to 10+ PFLOPS, it is OK to provide the performance with general-purpose CPU only (ex. Japan’s “KEI” Computer, Sequoia or Blue Water), but beyond, it will be quite harder  To prepare the upcoming days of 100 PFLOPS to 1 EFLOPS, we have to prepare because productive application programming requires a couple of years at least 2010/09/13 P2S2-2010 Panel 3

Is it a good or thing to be accepted ?  We have not been released yet from the curse of hybrid memory programming: MPI + OpenMP is the most efficient way for current multi-core + multi-socket node architecture with interconnection network  Regardless of the programmer’s pain, we are forced to do it, and we need a strong model, language and tools to release these pains  Issues to be considered  Memory hybridness (shared and distributed)  CPU hybridness (general and accelerator)  “flat” model is not a solution – we need to exploit the goodness of all these architecture as well as hybrid programming does 2010/09/13 P2S2-2010 Panel 4

Necessity of overcoming memory hybridness  Many of today’s parallel applications are still not ready for memory hybridness - many of them are written only with MPI  For really many cores such as 1M cores, it is impossible to continue MP-only programming  Increased cost for collective communication at lease with log(P) order  Memory footprint cost to manage huge number of processes is not negligible while memory capacity per core is reducing  It is relatively easy to apply automatic parallelization on hybrid memory architecture because such a huge parallelism must include multiple level of nested loops  Multi-level loop decomposition into memory hierarchy (and network hierarchy perhaps) 2010/09/13 P2S2-2010 Panel 5

An example of effort  Hybridness of CPU/GPU memory on a computation node  GPU is currently attached to CPU as a peripheral device as an I/O device with communication over PCI-E bus  It causes distributed memory (different address space) structure even on a single node  “Message Passing” in a node must be performed additionally to that among multiple nodes  XcalableMP (XMP) programming language  Programming of large and multiple data array distributed over multiple computation node to be translated as local index access and message passing (similar to HPF)  Both “global view” (for easy access to a unified data image) and “local view” (for performance tuning) are provided and unified  Data movement in global view makes the data transfer among nodes as like as simple data assignment 2010/09/13 P2S2-2010 Panel 6

gmove directive  The "gmove" construct copies data of distributed arrays in global-view.  When no option is specified, the copy operation is performed collectively by all nodes in the executing node set.  If an "in" or "out" clause is specified, the copy operation should be done by one-side communication ("get" and "put") for remote memory access. A B !$xmp nodes p(*) !$xmp template t(N) n n n n n n n n !$xmp distributed t(block) to p o o o o o o o o real A(N,N),B(N,N),C(N,N) d d d d d d d d Easy data movement e e e e e e e e !$xmp align A(i,*), B(i,*),C(*,i) with t(i) among CPU/GPU 1 2 3 4 1 2 3 4 address space A(1) = B(20) // it may cause error C !$xmp gmove A(1:N-2,:) = B(2:N-1,:) // shift operation node1 !$xmp gmove C(:,:) = A(:,:) // all-to-all node2 !$xmp gmove out node3 X(1:10) = B(1:10,1) // done by put operaiton node4 2010/09/13 P2S2-2010 Panel 7

CPU/GPU coordination data management CPU CPU cores memory GPU GPU cores Message Passing memory (MPI) Array data Loop execution • data distribution distribution process assignment • process assignment CPU • message passing CPU cores PCI-E • CPU/GPU data copy memory driver (CUDA data copy) All in the directive based GPU GPU cores sequential (-like) code memory by XMP/GPU Computation Node 2010/09/06 FP3C Kickoff 8 Meeting (Paris)

XMP/GPU image (dispatch to GPU) # pragma xmp nodes p(* ) // node declaration # pragma xmp nodes gpu g(* ) // GPU node declaration … # pragma xmp distribute AP() onto p(* ) // data distribution # pragma xmp distribute AG() onto g(* ) # pragma xmp align G[i] with AG[i] // data alignment # pragma amp align P[i] with AP[i] int main(void) { … // data movement by gmove (CPU ⇒ GPU) # pragma xmp gmove AG[:] = AP[:]; # pragma xmp loop on AG(i) for(i= 0; …) // computatio on GPU (passed to CUDA compiler) AG[i] = ... // data movement by gmove (GPU ⇒ CPU) # pragma xmp gmove AP[:] = AG[:]; 2010/09/06 FP3C Kickoff 9 Meeting (Paris)

What we need ?  Unified easy programming language and tools with additional performance tuning feature is required  At the first step of programming, easy import from sequential or traditionally parallel code is important  Directive-base additional feature is useful to keep the basic construct of the language as well as the room of performance tuning  How to specify a reasonable and effective standard directive to be applied for many of heterogeneous architectures ? 2010/09/13 P2S2-2010 Panel 10

P2S2-2010 Panel Is Hybrid Programming a Bad Idea Whose Time Has - PowerPoint PPT Presentation

P2S2-2010 Panel Is Hybrid Programming a Bad Idea Whose Time Has Come ? Taisuke Boku Center for Computational Sciences University of Tsukuba 2010/09/13 P2S2-2010 Panel 1 Definition Term of Hybrid Programming sometime means

PS4000 Assembly Guide Part List: A. 1 x Left Panel B. 1 x Right Panel C. 1 x Bottom Panel

Texas Tech University Jialin Liu, Yong Chen , Suren Byna September 1 st , 2015 P2S2 2015 Outline

Eye and Brain Eye and Brain Central visual pathways 1 2/22/2010 2 2/22/2010 3 2/22/2010 4

SEPG 2007 SEPG 2007 SPIN Panel SPIN Panel SEPG2007 - SPIN Panel Session SEPG2007 - SPIN Panel

FEC403EN Extinguishing Control Panel FEC403EN Extinguishing panel Table of contents Panel

MBAweb Panel 2019-12-23 1 MBA Recherche MBAweb Panel MBAweb Panel Presentation 2019-12-23

//Dashboard //Twitter Panel //Twitter Panel Context and Actions Act based on the document

Panel Regarding Sea- -Level Rise Level Rise Panel Regarding Sea Public Policy Forum March 10,

I.M. Skaugen SE 3Q 2010 presentation IMS Innovative Maritime Solutions 15 October 2010 1

Financial Results for 4/2010- -9/2010 9/2010 Financial Results for 4/2010 and and Financial

Research Prioritization Topic Briefs Advisory Panel Webinar Advisory Panel on Assessment of

A Pediatric Cancer Research Gene Panel Timothy J.Triche, M.D., Ph.D. Outline Panel Content

TESA-REFLEX panel II 18.08.2011 HEXAGON METROLOGY 1 TESA-REFLEX panel - SG Concept

ADMINISTRATIVE PANEL ADMINISTRATIVE PANEL Administrative panel is an instrument which helps

PANEL I: PANEL I: DECENT JOBS FOR YOUTH: DECENT JOBS FOR YOUTH: THE ROAD FOR SOCIO- -ECONOMIC

Aerospace & Defense Forum Aerospace & Defense Forum M&A Panel M&A Panel M&A

Boolean networks, local models, and finite polynomial dynamical systems Matthew Macauley

A 2HDM from Strong Dynamics Kei Yagyu Seikei U arXiv: 1803.01865 [hep-ph] Collaboration with

Which do you prefer? (vote by writing on the slide) Zoom Google Meet BBCollaborate Microsoft

PREDATORY JOURNALS: How to spot them Predatory journals? Money makes the world go round

Fe P - Ei Iq . > o } - Eitan .co } N' : Eai 's } F={LInPlu( N' I > II Sterner

Directional morphology and person marking Felix Rau Leiden University / University of Cologne

IKEv2: IPSec Key Management Protocol Lecture 20 Acknowledgement: Slides from Vincent Luk, revised

MTLE-6120: Advanced Electronic Properties of Materials Quantum kinetics Reading: Kasap: 3.7

P2S2-2010 Panel Is Hybrid Programming a Bad Idea Whose Time Has - PowerPoint PPT Presentation

P2S2-2010 Panel Is Hybrid Programming a Bad Idea Whose Time Has Come ? Taisuke Boku Center for Computational Sciences University of Tsukuba 2010/09/13 P2S2-2010 Panel 1 Definition Term of Hybrid Programming sometime means

PS4000 Assembly Guide Part List: A. 1 x Left Panel B. 1 x Right Panel C. 1 x Bottom Panel

Texas Tech University Jialin Liu, Yong Chen , Suren Byna September 1 st , 2015 P2S2 2015 Outline

Eye and Brain Eye and Brain Central visual pathways 1 2/22/2010 2 2/22/2010 3 2/22/2010 4

SEPG 2007 SEPG 2007 SPIN Panel SPIN Panel SEPG2007 - SPIN Panel Session SEPG2007 - SPIN Panel

FEC403EN Extinguishing Control Panel FEC403EN Extinguishing panel Table of contents Panel

MBAweb Panel 2019-12-23 1 MBA Recherche MBAweb Panel MBAweb Panel Presentation 2019-12-23

//Dashboard //Twitter Panel //Twitter Panel Context and Actions Act based on the document

Panel Regarding Sea- -Level Rise Level Rise Panel Regarding Sea Public Policy Forum March 10,

I.M. Skaugen SE 3Q 2010 presentation IMS Innovative Maritime Solutions 15 October 2010 1

Financial Results for 4/2010- -9/2010 9/2010 Financial Results for 4/2010 and and Financial

Research Prioritization Topic Briefs Advisory Panel Webinar Advisory Panel on Assessment of

A Pediatric Cancer Research Gene Panel Timothy J.Triche, M.D., Ph.D. Outline Panel Content

TESA-REFLEX panel II 18.08.2011 HEXAGON METROLOGY 1 TESA-REFLEX panel - SG Concept

ADMINISTRATIVE PANEL ADMINISTRATIVE PANEL Administrative panel is an instrument which helps

PANEL I: PANEL I: DECENT JOBS FOR YOUTH: DECENT JOBS FOR YOUTH: THE ROAD FOR SOCIO- -ECONOMIC

Aerospace &amp; Defense Forum Aerospace &amp; Defense Forum M&amp;A Panel M&amp;A Panel M&amp;A

Boolean networks, local models, and finite polynomial dynamical systems Matthew Macauley

A 2HDM from Strong Dynamics Kei Yagyu Seikei U arXiv: 1803.01865 [hep-ph] Collaboration with

Which do you prefer? (vote by writing on the slide) Zoom Google Meet BBCollaborate Microsoft

PREDATORY JOURNALS: How to spot them Predatory journals? Money makes the world go round

Fe P - Ei Iq . &gt; o } - Eitan .co } N' : Eai 's } F={LInPlu( N' I &gt; II Sterner

Directional morphology and person marking Felix Rau Leiden University / University of Cologne

IKEv2: IPSec Key Management Protocol Lecture 20 Acknowledgement: Slides from Vincent Luk, revised

MTLE-6120: Advanced Electronic Properties of Materials Quantum kinetics Reading: Kasap: 3.7

Aerospace & Defense Forum Aerospace & Defense Forum M&A Panel M&A Panel M&A

Fe P - Ei Iq . > o } - Eitan .co } N' : Eai 's } F={LInPlu( N' I > II Sterner