case studies in asynchronous message driven shared memory
play

Case Studies in Asynchronous, Message-Driven Shared Memory - PowerPoint PPT Presentation

Case Studies in Asynchronous, Message-Driven Shared Memory Programming Pritish Jetley Parallel Programming Laboratory pjetley2@illinois.edu 04/25/11 1 Outline Shared memory programming today Charm++ on multicore systems Shared


  1. Case Studies in Asynchronous, Message-Driven Shared Memory Programming Pritish Jetley Parallel Programming Laboratory pjetley2@illinois.edu 04/25/11 1

  2. Outline ● Shared memory programming today ● Charm++ on multicore systems ● Shared memory (SM) programming in Charm++ ● Case studies ● Barnes-Hut (SPLASH) ● SAH-based k d-tree construction 04/25/11 2

  3. SM programming today ● Fork-join ● Amorphous, thread-based (pthreads) ● Data parallelism-centric (OpenMP) ● Tasks (TBB, Cilk) ● Message-driven execution (Charm++) 04/25/11 3

  4. Fork-join model - + Forced synchrony Simple to program (?) Low-level Mutex Global view of control Grainsize control Natural fit for certain problems 04/25/11 4

  5. Charm++ on multicore systems ● Decompose algorithm into objects encapsulating its natural elements ● Objects present reactive interfaces ● Control flows through asynch. entry method invocations ● Data flows through pointer exchange 04/25/11 5

  6. SM programming with Charm++ and MDE - - + No global view of control Charm++ has no Natural decomposition faults whatsoever MDE is low-level Dependencies = messages Asynchrony Dynamic load balancing Task prioritization 04/25/11 6

  7. Performance and productivity studies ● How easy (or hard) is it to write SM programs in Charm++? ● Can we expect improvements in performance? ● Are there abstractions that would improve programmability in Charm++? 04/25/11 7

  8. Comparison points ● SPLASH2 Barnes-Hut benchmark ● Stud y e vo lutio n o f s e lf-g ra vita ting s ys te m s ● Tre e -b a s e d c o d e ● U s e s pth re a d s ● SAH-based k d-tree construction ● H ig h -pe rfo rm a nc e ra y tra c ing ● Ne s te d pa ra lle lis m ● U s e s TB B 04/25/11 8

  9. SPLASH Barnes-Hut ● Domain decomposition and tree building ● Partition space into compact, disjoint regions containing approximately equal numbers of particles ● Regions arranged in an octree ● Independent subtrees: task parallel ● Shuffle particles into child bins: data parallel ● Force calculation ● Objects own non-intersecting sets of particles, and calculate forces on them 04/25/11 9

  10. Decomposition ● Recursively divide partition into quadrants if more than τ particles within it τ = 3 04/25/11 10

  11. Domain decomposition Node task N Last particle First particle Combined messages Child tasks 04/25/11 11

  12. Decomposition with pthreads void decompose(){ for(int I = 0; I < myNP; I++){ Particle *p = myParticles[I]; Cell *cell = g_root; while(1){ c e ll->LOC K (); if(!cell->isLeaf()){ save = cell; int which = cell->which(p->key); cell = cell->child(which); s a ve ->UN LOC K (); } else{ cell->particles.add(p); cell->split(); c e ll->UN LOC K (); break; } } 04/25/11 12 } }

  13. Decomposition with Charm++ Tre e P ie c e ::re c vP a rtic le s (Particle *ptr, int np){ void Tre e P ie c e ::de c om pos e (){ if(myRoot->isLeaf()){ for(int I = 0; I < myNP; I++){ myRoot->addParticles(ptr,np); Particle *p = myParticles[I]; if(myRoot->split()){ int which = g_root->whichChild(p->key); forw a rdP a rtic le s ToC hildre n (myRoot->particles); buffe rP a rtic le (which,p); } if(outParticles[which].size() > THRESH){ } flus hP a rtic le s (which); else{ } forw a rdP a rtic le s ToC hildre n (ptr,np); } } flus hAllP a rtic le s (); } } void TreePiece::flushParticles(int I){ void TreePiece::forwardParticlesToChildren( tre e P ie c e P roxy[I].re c vP a rtic le s (buffered[I], for(int I = 0; I < NUM_CHILDREN; I++){ buffered[I].size()); t re e P ie c e P roxy[c hildInde x[I]].re c vP a rtic le s ( } childParticles[I], childPartilces[I].size()); } 04/25/11 13 }

  14. Tree traversal Tra ve rs e (Leaf b, Node n){ if( Is L e a f (n)){ L e a fF orc e s (b,n); } else if( S ide (n)/| r (n)- r (b)| < Theta_T){ C e llF orc e s (b,n); } 04/25/11 14

  15. Fewer barriers Title:100k.1.comparison.eps Title:10k.1.comparison.eps Creator:gnuplot 4.2 patchlevel 6 Creator:gnuplot 4.2 patchlevel 6 CreationDate:Tue Apr 19 01:05:26 2011 CreationDate:Tue Apr 19 01:03:33 2011 04/25/11 15

  16. Performance profile 04/25/11 16

  17. Performance profile 04/25/11 17

  18. More results Title:100k.2.comparison.eps Title:10k.2.comparison.eps Creator:gnuplot 4.2 patchlevel 6 Creator:gnuplot 4.2 patchlevel 6 CreationDate:Tue Apr 19 01:08:05 2011 CreationDate:Tue Apr 19 01:08:11 2011 04/25/11 18

  19. SAH-based k d-trees ● Used to efficiently render complex graphical scenes ● Task parallel construction of independent subtrees (dynamically created chares ) ● Data parallel calculation of node split point ( chare arrays ) 04/25/11 19

  20. Binary Space Partitioning ● SAH decides position of partition based on triangle distribution and partition surface area Partitioning plane Extents 04/25/11 20

  21. k d-tree construction Node task N Last triangle First triangle Particle chare array P Child tasks 04/25/11 21

  22. Charm++ pseudocode ● Use SDAG to sequence events in parallel scan e ntry void Worke r :: s c a nTria ng le C ounts (ActivationRec ar, NodeTaskID N){ dist = W >> 1; w hile (dist > 0){ if (thisIdx < dist){ ScanMsg m; m.NL = myNL; m.NR = ar.nTris-myNR; Re fN um (m) = dist; workers[thisIdx+dist]. re c vN e ig hborC ounts (m); } w he n recvNeighborCounts[ dis t ](ScanMsg m1){ myNL += m.NL; myNR -= m.NR; dist >>= 1; } } Plane bestPlane = c a lc ula te S AH (); 04/25/11 22 re duc e (bestPlane,N, N ode Ta s k :: g e tB e s tP la ne s ); }

  23. Charm++ implementation ● One chare for each node of kd-tree (orchestrator) ● For data-parallel operations, orchestrator either ● Fire s ne w c h a re s (d yna m ic lo a d b a la nc e ) ● U s e s c h a re a rra y (lo w o ve rh e a d o f us e ) ● Several optimizations in place ● Prio ritiza tio n ● A rra y-le ve l m c a s ts /re d uc tio ns ● M a nua l “s m e a ring ” o f ta s ks a t to p le ve l ● U s e o f c hunke d a rra ys – Re d uc e s fa ls e s h a ring 04/25/11 23 – Re d uc e s a m o unt o f c o o rd ina tio n c o m m unic a tio n

  24. Results Title:bunny.eps Title:fairy.eps Creator:gnuplot 4.2 patchlevel 6 Creator:gnuplot 4.2 patchlevel 6 CreationDate:Tue Apr 19 01:18:08 2011 CreationDate:Tue Apr 19 01:18:08 2011 Title:angel.eps Title:happy.eps Creator:gnuplot 4.2 patchlevel 6 Creator:gnuplot 4.2 patchlevel 6 CreationDate:Tue Apr 19 01:18:08 2011 CreationDate:Tue Apr 19 01:18:08 2011 04/25/11 24

  25. Performance profile 04/25/11 25

Recommend


More recommend