Starting Workflow Tasks Before They’re Ready Wladislaw Gusew, Bj¨ orn Scheuermann Computer Engineering Group, Humboldt University of Berlin
Agenda ◮ Introduction ◮ Execution semantics ◮ Methods and tools ◮ Simulation results ◮ Experimental results ◮ Conclusion 1 / 21
Big data in research 2 / 21
Scientific workflow example ◮ Directed Acyclic Graph (DAG) ◮ Executed on distributed systems ◮ Aggregation and broadcast types of tasks ◮ Demanding for network resources 3 / 21
Execution semantics 4 / 21
Execution semantics 4 / 21
Execution semantics ◮ But in reality resources are limited ◮ Execute only a subset of parent tasks concurrently (insufficient number of workers) ◮ Congestion of network (all parent tasks have the same priority) 4 / 21
Example execution 5 / 21
Example execution 5 / 21
Example execution 5 / 21
Example execution ◮ Network congestion can slow down processing even further (effects of data losses at the transport protocol layer) ◮ High delay to the start of the aggregation task ◮ Low performance and high execution costs (e.g., in computation clouds) 5 / 21
What can we do to improve this? 6 / 21
What can we do to improve this? 6 / 21
What can we do to improve this? 6 / 21
What can we do to improve this? 6 / 21
What can we do to improve this? 6 / 21
What can we do to improve this? 6 / 21
What can we do to improve this? List of actions: 1. Obtain information on task’s input characteristics 2. Refine the workflow and inform the execution engine 3. Let the aggregation task ”feel comfortable” in changed setting 6 / 21
What can we do to improve this? List of actions: 1. Obtain information on task’s input characteristics 2. Refine the workflow and inform the execution engine 3. Let the aggregation task ”feel comfortable” in changed setting 6 / 21
Obtaining input characteristics 1. Annotations to workflows 2. Manual code review 3. Automated profiling 7 / 21
Automated profiling ◮ Operating system instrumentation tool ◮ Enables interception of system calls (file open, read/write, file close) ◮ Record and evaluate logfiles with traces of conducted file accesses. 8 / 21
Automated profiling ◮ Operating system instrumentation tool ◮ Enables interception of system calls (file open, read/write, file close) ◮ Record and evaluate logfiles with traces of conducted file accesses. Reads by mAdd in a small workflow Reads by mAdd in a medium sized workflow 3 4.5 4 2.5 Read accesses [MB] Read accesses [MB] 3.5 2 3 2.5 1.5 2 1 1.5 1 0.5 0.5 0 0 0 0.5 1 1.5 2 2.5 3 0 2 4 6 8 10 12 14 16 18 Execution progress [10 8 CPU cycles] Execution progress [10 8 CPU cycles] 8 / 21
Refining workflow by transforming DAG 9 / 21
Refining workflow by transforming DAG 9 / 21
Refining workflow by transforming DAG 9 / 21
Refining workflow by transforming DAG 9 / 21
Realizing virtual task split ◮ Real task is transparently wrapped ◮ FUSE enables the setup of a virtual File system in USEr space ◮ Access to input files is performed through our wrapper ◮ Wrapper is responsible for maintaining the correct execution logic 10 / 21
Evaluation with the Montage workflow 11 / 21
Simulating workflow execution ◮ Java-based simulation framework for scientific workflows ◮ Simulates an execution on a Pegasus/HTCondor stack ◮ Use provided Montage workflows with 25, 50, 100, 1000 tasks ◮ Python script conducted DAG transformation of DAX files ◮ Network configured as bottleneck (by bandwidth limitation) W. Chen and E. Deelman, ”WorkflowSim: A toolkit for simulating scientific workflows in distributed environments,” in eScience’12. 12 / 21
Simulation results 13 / 21
Simulation results 13 / 21
Variation of number of tasks Simulation results for 50 workers and max-min Normal Split Total workflow runtime (log.) [s] 31% 1000 25% 19% 15% 100 10 1 25 50 100 1000 Number of tasks 14 / 21
Variation of workers 15 / 21
Variation of workers Simulation results for Montage 100 and min-min 450 Normal Split 400 10% Total workflow runtime [s] 350 300 14% 250 200 26% 25% 150 100 5 10 50 100 Number of workers 16 / 21
Variation of scheduling algorithms 17 / 21
Variation of scheduling algorithms Simulation results for Montage 100 on 100 workers 350 Normal Split Total workflow runtime [s] 300 17% 34% 250 200 25% 25% 27% 28% 150 100 50 0 M M R H D R o a i a E H n u F n - x E m - n T d m d F o T i n i - m n r o b i n Scheduling algorithm 18 / 21
Evaluation in a computing cluster ◮ Small cluster of up to 10 compute nodes ◮ Intel i7 CPU@ 2.5GHz, 8GB RAM, connected to common network switch with 1Gbit/s ◮ Execute Montage 133 workflow in Pegasus/HTCondor ◮ Network bandwidth was limited on application layer to 10Mbit/s ◮ 10 repetitions, mean values with 95% confidence intervals 19 / 21
Measurement results Computing cluster results for 1...10 workers 200 Original Montage 133 180 Transformed Montage 133 160 Total workflow runtime [s] 140 120 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 Number of computing nodes 20 / 21
Conclusion ◮ Many ”legacy” workflows exist which are executed with classic semantics ◮ Our approach is applicable to aggregation tasks that are often the most time intensive tasks in a workflow ◮ By using DAG transformation, no changes to task implementations and execution engines are required 21 / 21
Conclusion ◮ Many ”legacy” workflows exist which are executed with classic semantics ◮ Our approach is applicable to aggregation tasks that are often the most time intensive tasks in a workflow ◮ By using DAG transformation, no changes to task implementations and execution engines are required ◮ Simulation and real experiment show that performance can be improved by up to 15% ◮ Potential of outperforming the original workflow grows with increasing #workers and #tasks 21 / 21
Recommend
More recommend