a upc actor library and its evaluation on a shallow water
play

A UPC++ Actor Library and its Evaluation on a Shallow Water - PowerPoint PPT Presentation

A UPC++ Actor Library and its Evaluation on a Shallow Water Application Alexander Pppl 1 , Scott Baden 2 , Michael Bader 1 1 Department of Informatics Technical University of Munich 2 Computational Research Division Lawrence Berkeley National


  1. A UPC++ Actor Library and its Evaluation on a Shallow Water Application Alexander Pöppl 1 , Scott Baden 2 , Michael Bader 1 1 Department of Informatics Technical University of Munich 2 Computational Research Division Lawrence Berkeley National Laboratory Department of Computer Science and Engineering University of California, San Diego Parallel Applications Workshop, Alternatives To MPI+X November 18th 2019 Denver, Colorado

  2. Motivation Invasive Computing: CPU CPU CPU CPU CPU CPU - Dynamic resource allocation CPU CPU i- Core CPU CPU CPU - Memory Predictability through exclusive resource usage Memory Memory NoC NoC NoC Router Router Router - Heterogeneous compute tiles Memory I/O Memory Memory I/O Actor-based Modelling - Good fit for architecture, enables exploration of different NoC NoC NoC Router Router Router mappings of actors to compute tiles CPU CPU CPU CPU - SWE-X10 as sample application TCPA CPU CPU CPU CPU Memory Memory NoC NoC NoC Transfer to larger-scale applications Router Router Router Is it feasible to program an actor library using standard languages and frameworks? If so, how does performance compare, both to our X10-based library, and BSP? ü Tools: C++, OpenMP, UPC++ Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 2

  3. UPC++ A synchronous P artitioned G lobal A ddress S pace (APGAS) Model Shared Shared Shared … Reliance on one-sided communication Segment Segment Segment Asynchronous, continuation-based API Based on GASNet-EX, makes direct Private Private Private … use of InfiniBand and (some) Cray Segment Segment Segment interconnects Rank 0 Rank 1 Rank n Adapted from: UPC++ Specification v1.0 Draft 10, available at https://upcxx.lbl.gov Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 3

  4. UPC++ RPCs - Executed asynchronously - Serialization and transfer of parameters, return value - Completion events available after the local part (or overall RPC execution) is finished Rank m Rank n … Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 4

  5. UPC++ RPCs - Executed asynchronously - Serialization and transfer of parameters, return value - Completion events available after the local part (or overall RPC execution) is finished Global Pointers - Point to data in Shared segment - May be used as target for RMA operations Rank m Rank n … Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 5

  6. UPC++ RPCs - Executed asynchronously - Serialization and transfer of parameters, return value - Completion events available after the local part (or overall RPC execution) is finished Global Pointers - Point to data in Shared segment - May be used as target for RMA operations Distributed Objects - Created collectively - Same handle points to different objects on each rank Rank m Rank n … Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 6

  7. UPC++ Actor Library Actors - Encapsulate specific functionality, data and behavior - Behavior defined through finite state machines - No data sharing between actors - Defined communication endpoints (Ports) - Have the ability compute whenever data in their ports (InPorts or OutPorts) changes Ø Actors are being triggered Application Developers… - …subclass and implement act() method (actor FSM) - …use ports as communication endpoints - …specify which ports are connected Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 7

  8. UPC++ Actor Library Channels - Unidirectional connection between two ports - FiFo semantics - Operations: read() , write(T) , peek() - Guards: available() , freeCapacity() Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 8

  9. UPC++ Actor Library – Write C 3:LPC P L : 2 ) r o (track RPC t c A r e g g i r ( t completion) A1 A2 Channel A1::Out A2::In 1:RPC (insert Data) Rank N Rank M Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 9

  10. UPC++ Actor Library – Read 4:LPC (track RPC 2:RPC completion) (update capacity) A1 A2 Channel A1::Out A2::In 1:read 3:LPC (dequeue Data) (trigger Actor) Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 10

  11. UPC++ Actor Library – Actor Execution Strategies Rank-based Execution Strategy One thread per UPC++ rank, one rank per (logical) core One event loop: Query Query - Query Query runtime for progress Query Runtim Runtim Runtime Runtime - e Execute RPCs, mark affected actors e Perform Perform Perform Perform - Execute act() on affected actors RPCs RPCs RPCs RPCs act() act() act() act() May use sequential UPC++ code mode Low number of actors per rank Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 11

  12. UPC++ Actor Library – Actor Execution Strategies Thread-based Execution Strategy One thread per actor, and one communication thread, low number of ranks per node Comm Comm Two event loops: Query Query Query Query Query Query - Runtim Communication thread queries runtime and Runtime Runtime Runtime Runtime Runtime e executes RPCs Perform Perform Perform Perform Perform Perform - RPCs LPCs LPCs RPCs LPCs LPCs Actor threads query runtime for progress and execute LPCs, execute act act() act() act() act() Requires balancing of communication thread against number of actors Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 12

  13. UPC++ Actor Library – Actor Execution Strategies Task-based Execution Strategy Map act() executions on OpenMP tasks One event loop: Master Worker Worker Master Worker Worker - Master thread queries Runtime - Performs any incoming RPCs and Query Query Runtim act() act() triggers affected actors act() act() Runtime e - Schedules OpenMP task for each Perform Perform act() act() act() act() invocation of act . Dependencies RPCs RPCs between act invocations of same actor Schedule Schedule act() act() act() act() act() act() Large number of actors per rank possible Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 13

  14. Pond – A Shallow Water Proxy Application Based on prior applications - SWE , a BSP-based code written using MPI and OpenMP - SWE-X10 , an actor-based X10 application written using the actorX10 library Parallelized using our actor library Possible to auto-vectorize with AVX512 with Intel Compiler (v18.0) Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 14

  15. Pond – A Shallow Water Proxy Application       h hu hv hu 2 + 1 2 gh 2 + + huv = S ( t, x, y ) hu       hv 2 + 1 2 gh 2 hv huv t x y Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 15 Image: Bachelor-Lab Tsunami Simulation http://www5.in.tum.de/wiki/index.php/Tsunami_Simulation_-_Winter_15

  16. Pond – A Shallow Water Proxy Application Finite volume scheme on a Cartesian grid with piecewise constant unknown quantities and Euler time step Numerical approach based on LeVeque (R. J. LeVeque, D. L. George, and M. J. Berger. Tsunami modelling with adaptively refined finite volume methods. Acta Numerica, 20:211–289, 2011) Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 16 Image: Bachelor-Lab Tsunami Simulation http://www5.in.tum.de/wiki/index.php/Tsunami_Simulation_-_Winter_15

  17. Pond – A Shallow Water Proxy Application Finite volume scheme on a Cartesian grid with piecewise constant unknown quantities and Euler time step Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 17 Image: Bachelor-Lab Tsunami Simulation http://www5.in.tum.de/wiki/index.php/Tsunami_Simulation_-_Winter_15

  18. Pond – A Shallow Water Proxy Application Subdivision into rectangular, equally-sized patches with Halo regions Alexander Pöppl | A UPC++ Actor Library | PAW-ATM 2019 18

Recommend


More recommend