MTAGS 3rd IEEE Workshop on Many-Task 2010 Computing on Grids and Supercomputers Improving Many-Task Computing in Scientific Workflows Using P2P Techniques Jonas Dias Eduardo Ogasawara Daniel de Oliveira Esther Pacitti Marta Mattoso COPPE, Federal University of Rio de Janeiro, Brazil INRIA & LIRMM, Montpellier, France
MTAGS 2010 Introduction Pre-processing • Scientific Experiments • Petascale Computing – Behavior of hundreds of thousands processors – Parallel Execution failures Execution Kernel • Scientific Workflows – Represent the chaining of activities of an experiment Pos-processing – Scientific Workflow Management Systems (SWfMS) Typical Scientific Workflow Improving Many-Task Computing in 11/15/2010 2 Scientific Workflows Using P2P Techniques
MTAGS 2010 Experiment Execution • The same workflow may run several times – 5000 parameter combinations to try – 3 workflow variations – Total of 15000 instances to be executed • Motivation to parallelize – Accomplish the results timely – Clusters, Grids and Clouds • Utility Computing model – Give the answer when they are still necessary Improving Many-Task Computing in 11/15/2010 3 Scientific Workflows Using P2P Techniques
MTAGS 2010 Difficulties in Workflow Parallelism • MPI – Complex and legacy codes – Dynamic resource management – A job’s process may fail • Compromise the whole execution • Resubmitting relies on the scientist manual control – Not feasible for a huge number of tasks • Grid Schedulers – Submit many Jobs simultaneously – Waiting time on resource management queues Improving Many-Task Computing in 11/15/2010 4 Scientific Workflows Using P2P Techniques
MTAGS 2010 MTC Workflow Parallelism • Many-task computing (MTC) – Improve Parameter Sweep and Data Parallelism • HPC Cluster Systems – Not very easy to setup Jobs to be submitted – Centralized control – Compute nodes may fail • Open Issues – Best approaches to setup an experiment execution – Load balancing – Dynamic resource management – Control the failures • What has failed and needs to be rescheduled ? Improving Many-Task Computing in 11/15/2010 5 Scientific Workflows Using P2P Techniques
MTAGS 2010 MTC, Workflows and Clusters • The Heracles Approach – Approach to execute workflow activities • More transparent setup • Load Balancing • Quality of service • Distributed Provenance Gathering – Uses the P2P model • To be implemented in a cluster scheduler • Not P2P infrastructure Improving Many-Task Computing in 11/15/2010 6 Scientific Workflows Using P2P Techniques
MTAGS 2010 Heracles Overview Scientific Workflow Management System Workflow MTC Heracles Scheduler Cluster Improving Many-Task Computing in 11/15/2010 7 Scientific Workflows Using P2P Techniques
MTAGS 2010 Heracles Structure SWfMS Improving Many-Task Computing in 11/15/2010 8 Scientific Workflows Using P2P Techniques
MTAGS 2010 Heracles Structure SWfMS Workflow MTC Scheduler Workflow Instances Wrapper Cluster Scheduling Improving Many-Task Computing in 11/15/2010 9 Scientific Workflows Using P2P Techniques
MTAGS 2010 Heracles Structure SWfMS Workflow MTC Scheduler Heracles Workflow Task Instances Wrapper Cluster Scheduling Improving Many-Task Computing in 11/15/2010 10 Scientific Workflows Using P2P Techniques
MTAGS 2010 Heracles Structure SWfMS Workflow MTC Scheduler Executer Distributed Task Heracles Workflow Table Task Overlay Task Instances Wrapper Handler Task Execution Cluster Scheduling Monitoring Heracles Process Improving Many-Task Computing in 11/15/2010 11 Scientific Workflows Using P2P Techniques
MTAGS 2010 Heracles Structure SWfMS Workflow MTC Scheduler Executer Distributed Task Heracles Workflow Table Task Overlay Task Instances Wrapper Handler Task Execution Process Cluster Scheduling Monitoring Heracles Process Improving Many-Task Computing in 11/15/2010 12 Scientific Workflows Using P2P Techniques
MTAGS 2010 Heracles Structure SWfMS Workflow MTC Scheduler Executer Distributed Task Heracles Workflow Table Task Overlay Task Instances Wrapper Handler Task Execution Process Cluster Scheduling Monitoring Heracles Process Node Process Resource Node Process Manager Node Process Cluster Node Process Improving Many-Task Computing in 11/15/2010 13 Scientific Workflows Using P2P Techniques
MTAGS 2010 P2P view Heracles virtual P2P network view Node Process Process Resource Node Process Process Manager Node Process Process Cluster Node Process Process Improving Many-Task Computing in 11/15/2010 14 Scientific Workflows Using P2P Techniques
MTAGS 2010 Heracles Improving Many-Task Computing in 11/15/2010 15 Scientific Workflows Using P2P Techniques
MTAGS 2010 Transparency • Setup the deadline , not the number of nodes • Heracles controls the number of involved nodes – Execution partial efficiency – Automatically refresh the number of necessary processors Improving Many-Task Computing in 11/15/2010 16 Scientific Workflows Using P2P Techniques
MTAGS 2010 Dynamic Scheduling example 173 tasks per 200 hour 180 160 140 120 100 64 cores 80 60 40 20 0 0 5 10 15 20 Hours Completed tasks per hour Processing Cores Improving Many-Task Computing in 11/15/2010 17 Scientific Workflows Using P2P Techniques
MTAGS 2010 Efficiency 1 0.8 0.6 0.4 0.2 0 0 5 10 15 20 Hours Improving Many-Task Computing in 11/15/2010 18 Scientific Workflows Using P2P Techniques
MTAGS 2010 Load Balancing • Clusters depend on the head node control. • Tasks can have their autonomy – Like P2P dynamic control • Hierarchical organization – Based on P2P hierarchical networks – Group leaders – Working nodes Improving Many-Task Computing in 11/15/2010 19 Scientific Workflows Using P2P Techniques
MTAGS 2010 Quality of Service • Job’s process failure – Hard to reschedule on traditional approaches – Manual reschedule not feasible – How to address it in the provenance collection • P2P model can help – Autonomy of the nodes – Unfinished or failed tasks can be rescheduled – Provenance may register all execution attempts or the last execution attempt Improving Many-Task Computing in 11/15/2010 20 Scientific Workflows Using P2P Techniques
MTAGS 2010 When rescheduling? • Group leaders are responsible for the decision – Distributed table data • Status of the tasks on the distributed table – Pending, running or finished • Average execution time of a task • To reschedule means to change the status of the task to pending Improving Many-Task Computing in 11/15/2010 21 Scientific Workflows Using P2P Techniques
MTAGS 2010 Case Study • Analyze the impact of churn events on tasks execution on clusters – Many workflow activities to be executed – Activities are decomposed into tasks • Suffer with churn events – Activities producing 512, 1024, 2048 and 4096 tasks – Tasks is classified as small, medium and large – Seven days simulated – Calibrated using real experiment data Improving Many-Task Computing in 11/15/2010 22 Scientific Workflows Using P2P Techniques
MTAGS 2010 Rescheduling Types • Manual Rescheduling – Scientists checks activity status every twelve hours – If a failure happens, all the tasks of the activity are rescheduled • Automatic Rescheduling – Only the task that has failed is rescheduled Improving Many-Task Computing in 11/15/2010 23 Scientific Workflows Using P2P Techniques
MTAGS 2010 Small Tasks Improving Many-Task Computing in 11/15/2010 24 Scientific Workflows Using P2P Techniques
MTAGS 2010 Medium Tasks Improving Many-Task Computing in 11/15/2010 25 Scientific Workflows Using P2P Techniques
MTAGS 2010 Big Tasks Improving Many-Task Computing in 11/15/2010 26 Scientific Workflows Using P2P Techniques
MTAGS 2010 Conclusions • Empowering scientific experiments execution – Scientific Workflow parallelization on huge clusters – Many task computing – Process failures, poor load balancing, usability issues • Heracles Approach – Transparency, load balance and quality of service – Using P2P model on clusters • Case study showed the gains with automatic rescheduling Improving Many-Task Computing in 11/15/2010 27 Scientific Workflows Using P2P Techniques
MTAGS 2010 Future Work • Analyze the advantages that MTC schedulers can achieve when using full Heracles approach • Using Heracles on real experiments – Implementing it on real schedulers such as Hydra • Evaluate other fault tolerant mechanisms such as redundant executions Improving Many-Task Computing in 11/15/2010 28 Scientific Workflows Using P2P Techniques
MTAGS 2010 Acknowledgements A P2P Approach to Many Tasks Computing 6/24/2010 29 for Scientific Workflows
MTAGS 3rd IEEE Workshop on Many-Task 2010 Computing on Grids and Supercomputers Improving Many-Task Computing in Scientific Workflows Using P2P Techniques COPPE, Federal University of Rio de Janeiro, Brazil INRIA & LIRMM, Montpellier, France
Recommend
More recommend