Towards Jungle Computing with Ibis/Constellation Jason Maassen, Niels Drost Henri Bal, Frank Seinstra Department of Computer Science VU University, Amsterdam, The Netherlands
Introduction ● HPC is entering many domains ● Not just: physics / chemistry / climate modelling ● Also: semantic web / medical / multimedia analysis / neuroinformatics / remote sensing / astronomy / ... ● HPC is becoming more complex ● Not just large SMP or clusters, instead: ● Clusters of SMPs / Grids / Clouds / Supers / ... ● Heterogenous machines using GPU / Cell / FPGA ● “It‟s a jungle out there“ 3DAPAS Workshop 2011 2
Example Domain Computational Astrophysics (amusecode.org)
Jungle Computing ● Worst case computing ... as required by users ● Arbitrary combination of distributed, hierarchical, and heterogenous computing 3DAPAS Workshop 2011 4
Many Task Computing According to Raicu, Foster, et al [SC‟08] “High -performance computations comprising multiple distinct activities, coupled via file system operations or message passing. Tasks may be small or large, uni-processor or multi-processor, compute-intensive or data-intensive. The set of tasks may be static or dynamic, homogeneous or heterogeneous, loosely coupled or tightly coupled. The aggregate number of tasks, quantity of computing, and volumes of data may be extremely large.” ● Applications are dynamic and heterogeneous workflows / DAGs of activities 3DAPAS Workshop 2011 5
MTC in the Jungle ● MTC has advantages for Jungle Computing ● Many distinct activities ● Can be implemented independently using the tools and targeted to the HPC architecture, that best suit them ● Reduced programming complexity ● Complete applications are constructed using sequences and combinations of activities 3DAPAS Workshop 2011 6
Constellation ● MTC system for Jungle Computing ● Model based on: activities (tasks) executors (resources) contexts (matchmaking) events (communication) 3DAPAS Workshop 2011 7
Constellation Model Application ● Application: set of activities ● Distinct tasks ● Size and complexity may vary ● Targeted at specific HPC platform ● (Loosly) Coupled using events ● Often wrapper around existing code ● Similar to workflow or DAG of tasks ● Dynamic and unlimited in size 3DAPAS Workshop 2011 8
Constellation Model Hardware ● Hardware: set of executors ● Capable of running activities ● May represent anything from a single core to an entire cluster, a GPU, etc. ● May be application specific ● Provides an application specific heterogeneous resource pool 3DAPAS Workshop 2011 9
Constellation Model Context ● Both activities and executors are tagged with a context ● Application defined label (+ rank) ● Used to defines relationship between activites and executors, e.g.: ● Data dependencies, hardware requirements, ... ● May combine contexts ● Executors may have preference for label or rank 3DAPAS WorkShop 2011 10
Constellation Model Matchmaking ● RTS performs load-balancing and match-making ● Ensures activities are forwarded to a suitable executor ● Tries to keep all executors busy ● Uses context-aware work-stealing ● RTS also performs event routing ● Based on unique activity identifier ComplexHPC Spring School 2011 11
Constellation API 3DAPAS Workshop 2011 12
Constellation API 3DAPAS Workshop 2011 13
DACH 2008 Data Challenge in conjunction with IEEE Cluster/Grid 2008 ● Supernova detection ● Analyse 1052 image pairs on 11 clusters (Intrigger) ● „Sequential‟ executable provided 3DAPAS Workshop 2011 14
DACH 2008 Problem ● Main problems: ● Data distribution ● Heterogeneity of work and hardware ● Load balancing 3DAPAS Workshop 2011 15
DACH 2008 Workflow ● Winning approach in 2008: ● Parallelize workflow to improve hardware utilization ● Create hierarchical master worker framework ● Scheduling heuristics using data location and size 3DAPAS Workshop 2011 16
Constellation Version Option 1: Monolythic ● Wrap entire application in a single activity ● One activity per image pair ● Wrap each machine in one executor ● Multiple cores per executor ● Use context to influence order and placement of each of activities 3DAPAS Workshop 2011 17
Evaluation ● Intrigger not available ● Instead we use DAS3+DAS4 ● 5+6 clusters in the Netherlands ● Mix of 2/4/8/12/48 core machines ● Various types of GPUs ● Three Scenarios ● Data locality ● (Executor granularity) ● Heterogeneous processing 3DAPAS Workshop 2011 18
Scenario 1 Data Locality ● Data distributed over 4 clusters of DAS3 + DAS4 ● Use context to express data locality and preferred processing order ● Adapt context to tune application ● No change in application 3DAPAS Workshop 2011 19
Scenario 1 Results Activity Executor Effect “any” “any” Random order “any”,50 “any”, Sorted biggest by size “VU3”,”VU4”,50 “VU3”, Local only biggest Sorted by size “VU3”,”VU4”, “VU3”, Preference ”any”,50 “any”, for local biggest Fallback to any, Sorted by size 3DAPAS Workshop 2011 20
Constellation Version Option 2: Workflow ● Wrap each stage in activity ● Wrap each core executor ● Use context to influence order and placement of each of the jobs 3DAPAS Workshop 2011 21
Scenario 3: Heterogeneous System ● 18 node GPU cluster ● 8 cores + 1 GPU per node ● Activity: single task ● Executor: 1 core (top) 1 core or GPU (bottom) ● Replaced activity 7.2 with GPU version. ● Label activities and executors accordingly ● Significant performance gain. ComplexHPC Spring School 2011 22
Conclusions ● We think Jungle Computing is a neccesity for some application areas. ● Constellation offers a suitable model (MTC) to create such applications. ● Initial experiments show that Constellation works well for a wide range of hardware configurations ● Easy to reconfigure applications to match resources ● Allows integration of specialized accellerator codes ● Suitable basis for a Jungle Computing model 3DAPAS Workshop 2011 23
Future Work ● Application development ● AMUSE ● Remote Sensing ● Climate modelling ● Platform improvements ● Easier integration of existing codes ● Smart/automatic deployment/tuning of executors ● Improve data handling ● Better monitoring 3DAPAS Workshop 2011 24
Questions ? jason@cs.vu.nl www.cs.vu.nl/ibis 3DAPAS Workshop 2011 25
Scenario 2 Executor Granularity ● 30 largest images only ● Single 48 core machine ● Activity: entire application (a-c) single task (d) ● Executor: [n]-cores ● No change in application for experiment (a-c) ● Only change executor config. ● Completely ported application in (d) ● Significant performance gain! 3DAPAS Workshop 2011 26
Recommend
More recommend