4 May 10, 2011 1/36 Outline Koala Architecture Job Model System - PowerPoint PPT Presentation

ComplexHPC Spring School Day 2: KOALA Tutorial The KOALA Scheduler Nezih Yigitbasi Delft University of Technology 4 May 10, 2011 1/36

Outline • Koala Architecture • Job Model • System Components • Support for different application types • Parallel Applications • Parameter Sweep Applications (PSAs) • Workflows 2/36

Introduction Developed in the DAS system • Has been deployed on the DAS-2 in September 2005 • Ported to DAS-3 in April’07, and to DAS-4 in April’11 • Independent from grid middlewares such as Globus • Runs on top of local schedulers • • Objectives: • Data and processor co-allocation in grids • Supporting different application types • Specialized job placement policies 3/36

Background (1): DAS-4 UvA/MultimediaN (72) VU (148 CPUs) UvA (32) Operational since oct. 2010 SURFnet6 • 1,600 cores (quad cores) • 2.4 GHz CPUs • accelerators • 180 TB storage 10 Gb/s lambdas • Infiniband • Gb Eternet TU Delft (64) Leiden (32) Astron (46) May 13, 2011 4/36

Background (2): G rid Applications • Different application types with different characteristics: • Parallel applications • Parameter sweep applications • Workflows • Data intensive applications • Challenges : • Application characteristics and needs • Grid infrastructure is highly heterogeneous • Grid infrastructure configuration issues • Grid resources are highly dynamic 5/36

Koala Job Model • A job consists of one or more job Non-fixed job components • A job component contains: • An executable name scheduler decides on component placement • Sufficient information necessary for scheduling • Sufficient information necessary for execution Fixed job Flexible job job components same total job size job component placement fixed scheduler decides on split up and placement 6/36

Koala Architecture (1) 7/36

Koala Architecture (2): A Closer Look • PIP/NIP: information services • RLS: replica location service • CO: co-allocator • PC: processor claimer • RM: run monitor • RL: runners listener • DM: data mover • Ri: runners 8/36

Scheduler • Enforces Scheduling Policies • Co-Allocation Policies • Worst-Fit, Flexible Cluster Min., Comm. Aware, Close to Files • Malleability Management Policies • Favour Previously Started Malleable Applications • Equi Grow Shrink • Cycle Scavenging Policies • Equi-All, Equi-PerSite • Workflow Scheduling Policies • Single-Site, Multi-Site 9/36

Runners Extends support for different application types • KRunner : Globus runner • PRunner : A simplified job runner • IRunner : Ibis applications • OMRunner : OpenMPI applications • MRunner : Malleable applications based on the DYNACO • framework WRunner : For workflows (Directed Acyclic Graphs) and BoTs • 10/36

The Runners Framework 11/36

Support for Different Application Types • Parallel Applications • MPI, Ibis,… • Co-Allocation • Malleability • Parameter Sweep Applications • Cycle Scavenging • Run as low-priority jobs • Workflows 12/36

Support for Co-Allocation • What is co-allocation (just to remind) • Co-allocation Policies • Experimental Results 13/36

Co-Allocation • Simultaneous allocation of resources in multiple sites • Higher system utilizations • Lower queue wait times • Co-allocated applications might be less efficient due to the relatively slow wide-area communications • Parallel applications may have different communication characteristics 14/36

Co-Allocation Policies (1) • Dictate where the components of a job go • Policies for non-fixed jobs : • Load-aware : Worst Fit ( WF ) (balance load in clusters) • Input-file-location-aware : Close-to-Files ( CF ) (reduce file-transfer times) • Communication-aware : Cluster Minimization ( CM ) (reduce number of wide-area messages) See: H.H. Mohamed and D.H.J. Epema, “An Evaluation of the Close-to-Files Processor and Data Co-Allocation Policy in Multiclusters,” IEEE Cluster 2004. 15/36

Co-Allocation Policies (2) • Placement policies for flexible jobs : • Queue time-aware : Flexible Cluster (CM + reduce queue wait time) Minimization ( FCM ) Communication-aware : Communication • (decisions based on inter-cluster Aware ( CA ) communication speeds) See: O.O.Sonmez, H.H. Mohamed and D.H.J. Epema, “Communication-aware Job Scheduling Policies for the Koala Grid Scheduler”, IEEE e-Science 2006. 16/36

Co-Allocation Policies (3) Clusters WF C1 (16) C2 (16) C3 (16) Components 8 8 8 I II III FCM C1 (16) C2 (16) C3 (16) Component 24 I II 17/36

Experimental Results : Co-Allocation Vs. No co-allocation • OpenMPI + DRMAA • no co-allocation (left) vs. co-allocation (right) • workloads of real parallel applications range from computation- (Prime) to very communication-intensive (Wave) average job response time (s) Prime Poisson Wave co-allocation is disadvantageous for communication-intensive applications 18/36

Experimental Results : The performance of the policies • Flexible Cluster Min. vs. Comm. Aware • Workloads of communication-intensive applications average job response time (s) FCM CA FCM CA [w/o Delft] [with Delft] considering the network metrics improves the co-allocation performance 19/36

Support for PSAs in Koala • Background • System Design • Scheduling Policies • Experimental Results 20/36

Parameter Sweep Application Model • A single executable that runs for a large set of parameters • E.g.; monte-carlo simulations, bioinformatics applications... • PSAs may run in multiple clusters simultaneously • We support OGF’s JSDL 1.0 (XML) 21/36

Motivation • How to run thousands of tasks in the DAS? • Issues: • 15 min. rule! • Observational scheduling • Overload • Run them as Cycle Scavenging Applications !! • Sets priority classes implicitly • No worries for observing empty clusters 22/36

Cycle Scavenging • The technology behind volunteer computing projects • Harnessing idle CPU cycles from desktops • Download a software (screen saver) • Receive tasks from a central server • Execute a task when the computer is idle • Immediate preemption when the user is active again 23/36

System Requirements 1. Unobtrusiveness Minimal delay for (higher priority) local and grid jobs 2. Fairness Multiple cycle scavenging applications running concurrently should be assigned comparable CPU-Time 3. Dynamic Resource Allocation Cycle scavenging applications has to Grow/Shrink at runtime 4. Efficiency As much use of dynamic resources as possible 5. Robustness and Fault Tolerance Long-running, complex system: problems will occur, and must be dealt with 24/36

System Interaction monitors/informs CS Policies: idle/demanded Head Node • Equi-All: resources Scheduler grid-wide basis KCM • Equi-PerSite: per cluster Node grow/shrink submits registers messages launchers submits PSA(s) Launcher CS-Runner deploys, JDL monitors, and preempts Clusters tasks Application Level Scheduling: • Pull-based approach • Shrinkage policy 25/36

Cycle Scavenging Policies 1. Equipartition-All Clusters C1 (12) C2 (12) C3 (24) CS User-1 CS User-2 CS User-3 26/36

Cycle Scavenging Policies 2. Equipartition-PerSite Clusters C1 (12) C2 (12) C3 (24) CS User-1 CS User-2 CS User-3 27/36

Experimental Results • DAS3 • Equi-All vs. Equi-PerSite • Using Launchers vs. not • 3 CS Users submit the same application • 60s. dummy tasks with the same parameter range • Tested on a 32-node cluster • Non-CS Workloads: WBlock, WBurst Number of Completed Jobs Makespan [s] Equi-All Equi-All Equi-PerSite Equi-PerSite Number of Jobs WBlock WBurst WBlock WBurst job startup overhead Equi-PerSite is fair and superior to Equi-All + information delay See: O. Sonmez, B. Grundeken, H.H. Mohamed, Alex Iosup, D.H.J. 28/36 Epema, Scheduling Strategies for Cycle Scavenging in Multicluster Grid Systems, CCGrid 2009.

Support for Workflows in Koala Applications with dependencies • • e.g., Montage workflow • Astronomy application to generate mosaics of the sky • 4500 tasks Dependencies are file transfers • • Experience the WRunner in the hands-on session 29/36

Workflow Scheduling Policies (1/3) 1. Round Robin: submits the eligible tasks to the clusters in round-robin order 2. Single Cluster: maps every complete workflow to the least -loaded cluster at its submission 3. All Clusters: submits each eligible task to the least loaded cluster 30/36

Workflow Scheduling Policies (2/3) 4. All Clusters File-Aware: submits each eligible task to the cluster that minimizes the transfer costs of the files on which it depends 5. Coarsening*: iteratively reduces the size of a graph by collapsing groups of nodes and their internal edges • We use Heavy Edge Matching* technique to group tasks that are connected with heavy edges 31/36 * G. Karypis and V. Kumar. Multilevel graph partitioning schemes . In Int. Conf. Par. Proc., pages 113–122, 1995.

4 May 10, 2011 1/36 Outline Koala Architecture Job Model System - PowerPoint PPT Presentation

ComplexHPC Spring School Day 2: KOALA Tutorial The KOALA Scheduler Nezih Yigitbasi Delft University of Technology 4 May 10, 2011 1/36 Outline Koala Architecture Job Model System Components Support for different application

Model Order Reduction for Wave Equations Rob F. Remis and J orn T. Zimmerling DCSE Fall

WHAT HAVE WHAT HAVE I LEARNED I LEARNED AT THE AT THE DISC DISC SUMMER SCHOOL? SUMMER

Using CP When You Don't Know CP Christian Bessiere LIRMM (CNRS/U. Montpellier) An illustrative

Frenetic: A High-Level Language for OpenFlow Networks Nate Foster, Rob Harrison , Matthew L.

TREC 2006 Video Retrieval Evaluation Introductions Paul Over* Wessel Kraaij (TNO ICT) Tzveta

Post-gradual education at the SUT Bratislava Univ. Prof. habil. Alojz KOP IK, PhD.

Rank optimality for the Burer-Monteiro factorization Ir` ene Waldspurger CNRS and CEREMADE

Numerical semigroups associated to algebraic curves A. Araujo 1 O. Neto 2 1 CMAF/UA 2 CMAF/FCUL

Explicit Expanding Expanders as Datacenter Topologies Michael Dinitz Johns Hopkins University

A variation of gluing of numerical semigroups Takahiro Numata Nihon University 9th September

Scientific Research on Yoga as a Contemplative Practice Yoga Alliance Webinar June 23, 2020 Sat

Time-varying external potentials in NBODY6 Or: The evolution of KZ(14) Mark Gieles

A new connection between additive number theory and invariant theory K alm an S. Cziszter

Graphs with convex-QP stability number Domingos M. Cardoso (Universidade de Aveiro) 1

Asymptotic Density of Properties in Cellular Automata Laurent Boyer equipe LIMD, LAMA

Property (T) for quantum groups from the Property (T) for groups dual point of view Quantum

Readiness of WT and related plants (TG6) C. Cattadori Situation of Water drain 3 Tests of

Jean-Charles Faugre Joint work with: L. Huot G. Renault and M. Safey El Din, L Perret, P .J.

Digital Signal Processing amplify or filter out embedded information detect patterns

Chap 12: Facet Into Multiple Views Paper: Multiform Matrices and Small Multiples Tamara Munzner

2007 2007 Average Price of SF New Home Average Price of SF New Home SALES SALES

Systems programming Thread management (Cont.) Most of the slides in this lecture are either from

TH/3-3: Assessment of Scrape-off Layer Simulations with Drifts against L-mode Experiments in

Signing HTTP Requests and Responses Dave Tonge, OAuth Security Workshop 2019 Use case: