starting workflow tasks before they re ready
play

Starting Workflow Tasks Before Theyre Ready Wladislaw Gusew, Bj - PowerPoint PPT Presentation

Starting Workflow Tasks Before Theyre Ready Wladislaw Gusew, Bj orn Scheuermann Computer Engineering Group, Humboldt University of Berlin Agenda Introduction Execution semantics Methods and tools Simulation results


  1. Starting Workflow Tasks Before They’re Ready Wladislaw Gusew, Bj¨ orn Scheuermann Computer Engineering Group, Humboldt University of Berlin

  2. Agenda ◮ Introduction ◮ Execution semantics ◮ Methods and tools ◮ Simulation results ◮ Experimental results ◮ Conclusion 1 / 21

  3. Big data in research 2 / 21

  4. Scientific workflow example ◮ Directed Acyclic Graph (DAG) ◮ Executed on distributed systems ◮ Aggregation and broadcast types of tasks ◮ Demanding for network resources 3 / 21

  5. Execution semantics 4 / 21

  6. Execution semantics 4 / 21

  7. Execution semantics ◮ But in reality resources are limited ◮ Execute only a subset of parent tasks concurrently (insufficient number of workers) ◮ Congestion of network (all parent tasks have the same priority) 4 / 21

  8. Example execution 5 / 21

  9. Example execution 5 / 21

  10. Example execution 5 / 21

  11. Example execution ◮ Network congestion can slow down processing even further (effects of data losses at the transport protocol layer) ◮ High delay to the start of the aggregation task ◮ Low performance and high execution costs (e.g., in computation clouds) 5 / 21

  12. What can we do to improve this? 6 / 21

  13. What can we do to improve this? 6 / 21

  14. What can we do to improve this? 6 / 21

  15. What can we do to improve this? 6 / 21

  16. What can we do to improve this? 6 / 21

  17. What can we do to improve this? 6 / 21

  18. What can we do to improve this? List of actions: 1. Obtain information on task’s input characteristics 2. Refine the workflow and inform the execution engine 3. Let the aggregation task ”feel comfortable” in changed setting 6 / 21

  19. What can we do to improve this? List of actions: 1. Obtain information on task’s input characteristics 2. Refine the workflow and inform the execution engine 3. Let the aggregation task ”feel comfortable” in changed setting 6 / 21

  20. Obtaining input characteristics 1. Annotations to workflows 2. Manual code review 3. Automated profiling 7 / 21

  21. Automated profiling ◮ Operating system instrumentation tool ◮ Enables interception of system calls (file open, read/write, file close) ◮ Record and evaluate logfiles with traces of conducted file accesses. 8 / 21

  22. Automated profiling ◮ Operating system instrumentation tool ◮ Enables interception of system calls (file open, read/write, file close) ◮ Record and evaluate logfiles with traces of conducted file accesses. Reads by mAdd in a small workflow Reads by mAdd in a medium sized workflow 3 4.5 4 2.5 Read accesses [MB] Read accesses [MB] 3.5 2 3 2.5 1.5 2 1 1.5 1 0.5 0.5 0 0 0 0.5 1 1.5 2 2.5 3 0 2 4 6 8 10 12 14 16 18 Execution progress [10 8 CPU cycles] Execution progress [10 8 CPU cycles] 8 / 21

  23. Refining workflow by transforming DAG 9 / 21

  24. Refining workflow by transforming DAG 9 / 21

  25. Refining workflow by transforming DAG 9 / 21

  26. Refining workflow by transforming DAG 9 / 21

  27. Realizing virtual task split ◮ Real task is transparently wrapped ◮ FUSE enables the setup of a virtual File system in USEr space ◮ Access to input files is performed through our wrapper ◮ Wrapper is responsible for maintaining the correct execution logic 10 / 21

  28. Evaluation with the Montage workflow 11 / 21

  29. Simulating workflow execution ◮ Java-based simulation framework for scientific workflows ◮ Simulates an execution on a Pegasus/HTCondor stack ◮ Use provided Montage workflows with 25, 50, 100, 1000 tasks ◮ Python script conducted DAG transformation of DAX files ◮ Network configured as bottleneck (by bandwidth limitation) W. Chen and E. Deelman, ”WorkflowSim: A toolkit for simulating scientific workflows in distributed environments,” in eScience’12. 12 / 21

  30. Simulation results 13 / 21

  31. Simulation results 13 / 21

  32. Variation of number of tasks Simulation results for 50 workers and max-min Normal Split Total workflow runtime (log.) [s] 31% 1000 25% 19% 15% 100 10 1 25 50 100 1000 Number of tasks 14 / 21

  33. Variation of workers 15 / 21

  34. Variation of workers Simulation results for Montage 100 and min-min 450 Normal Split 400 10% Total workflow runtime [s] 350 300 14% 250 200 26% 25% 150 100 5 10 50 100 Number of workers 16 / 21

  35. Variation of scheduling algorithms 17 / 21

  36. Variation of scheduling algorithms Simulation results for Montage 100 on 100 workers 350 Normal Split Total workflow runtime [s] 300 17% 34% 250 200 25% 25% 27% 28% 150 100 50 0 M M R H D R o a i a E H n u F n - x E m - n T d m d F o T i n i - m n r o b i n Scheduling algorithm 18 / 21

  37. Evaluation in a computing cluster ◮ Small cluster of up to 10 compute nodes ◮ Intel i7 CPU@ 2.5GHz, 8GB RAM, connected to common network switch with 1Gbit/s ◮ Execute Montage 133 workflow in Pegasus/HTCondor ◮ Network bandwidth was limited on application layer to 10Mbit/s ◮ 10 repetitions, mean values with 95% confidence intervals 19 / 21

  38. Measurement results Computing cluster results for 1...10 workers 200 Original Montage 133 180 Transformed Montage 133 160 Total workflow runtime [s] 140 120 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 Number of computing nodes 20 / 21

  39. Conclusion ◮ Many ”legacy” workflows exist which are executed with classic semantics ◮ Our approach is applicable to aggregation tasks that are often the most time intensive tasks in a workflow ◮ By using DAG transformation, no changes to task implementations and execution engines are required 21 / 21

  40. Conclusion ◮ Many ”legacy” workflows exist which are executed with classic semantics ◮ Our approach is applicable to aggregation tasks that are often the most time intensive tasks in a workflow ◮ By using DAG transformation, no changes to task implementations and execution engines are required ◮ Simulation and real experiment show that performance can be improved by up to 15% ◮ Potential of outperforming the original workflow grows with increasing #workers and #tasks 21 / 21

Recommend


More recommend