in parallel dag based
play

in Parallel DAG-based Data Flow Programs Bjrn Lohrmann Dominic - PowerPoint PPT Presentation

Detecting Bottlenecks in Parallel DAG-based Data Flow Programs Bjrn Lohrmann Dominic Battr Matthias Hovestadt Alexander Stanik Daniel Warneke Email: {firstname}.{lastname}@tu-berlin.de Complex and Distributed IT-Systems Technische


  1. Detecting Bottlenecks in Parallel DAG-based Data Flow Programs Björn Lohrmann Dominic Battré Matthias Hovestadt Alexander Stanik Daniel Warneke Email: {firstname}.{lastname}@tu-berlin.de Complex and Distributed IT-Systems Technische Universität Berlin

  2. Introduction (1) IaaS clouds offer virtual machines on-demand Why use clouds for data processing? ■ Fast and unlimited** scale-out ■ Pricing Model ♦ Pay-as-you-go ♦ 10 nodes for 1 day = 1 node for 10 days ■ No long-term obligations **almost 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 2

  3. Introduction (2) Frameworks are required for effective use of clouds ? Parallelization Job Modelling Job Scheduling Eucalyptus Hadoop VM Nephele Management etc. Job Job Deployment Monitoring 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 3

  4. Prerequisites ● Jobs modelled as directed Task 4 acyclic graphs ■ Vertices are tasks ■ Edges are communication channels ● Each task has 1..n parallel Task 2 Task 3 task instances ● Unidirectional and blocking communication Task 1 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 4

  5. Overview Key question of this talk: Task 5 Task 5 ● Given a DAG-shaped job, how many task instances should I assign to each task? Task 3 Task 4 Our approach ● Begin with 1 instance for Task 2 each task ● Iteratively detect bottlenecks and add instances where Task 1 necessary 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 5

  6. Bottlenecks Negative effects of bottlenecks: Task 5 Task 5 ■ Input starvation ■ Output blockage Task 3 Task 4 Low throughput of workflow Low resource utilization Time and money wasted Task 2 Task 1 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 6

  7. Bottlenecks Types: Task 3 CPU ● CPU ■ Enough input available Task 2 CPU ■ Throughput limited by CPU ■ Lack of input for subsequent Task 1 CPU tasks ● I/O ■ Transport infrastructure is Task 2 CPU overloaded (NICs, switches, etc) ■ Forces tasks to wait Task 1 CPU 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 7

  8. Bottleneck Detection ● Monitor job at runtime: ● Continuously measure CPU load and I/O wait on task instances ● Aggregate to task statistics ● Continuously analyze task statistics: ■ Traverse task nodes in reverse topological order and check for CPU bottlenecks ■ If none found traverse edges in reverse topological order and check for I/O bottlenecks ■ If bottleneck found: Report it! 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 8

  9. Implementation ● Based on Nephele framework ■ Java framework ■ 1 master, n workers ■ Task instance = Java thread ● Analysis of thread state statistics: ■ Threshold for CPU bottleneck: ♦ USR + SYS + BLK >= 90% time ■ Threshold for I/O bottleneck ♦ WAIT caused by sending on channel >= 90% time 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 9

  10. Evaluation Demo Job Setup: ● Private compute cloud PDF Index Writer Writer ● Hosts with two Intel Xeon 2,66Ghz, 32 GB RAM and PDF Inverted 1GB Ethernet Creator Index ● KVM guests with one virtual CPU and 2GB RAM OCR ● Eucalyptus framework for VM File allocation/deallocation Reader 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 10

  11. Evaluation (2) Phase 1: Fine tuning 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 11

  12. Evaluation (1) Phase 2: Scale-out 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 12

  13. Conclusion ● Bottleneck detection is useful to scale out jobs in the cloud, while maintaining high resource utilization ● We presented a simple approach to gather and analyze relevant statistics ● Right now, manual adaptation and job re-runs are necessary to eliminate bottlenecks ● Future work: ■ Dynamically and automatically adjust parallelization at runtime 15.11.2010 Björn Lohrmann- Detecting Bottlenecks in Parallel Dag-based Data Flow Programs 13

Recommend


More recommend