technische universität Robotics Research Institute dortmund The Gain of Resource Delegation in Distributed Computing Environments Alexander Fölling, Christian Grimme, Joachim Lepping, and Alexander Papaspyrou 15th Workshop on Job Scheduling for Parallel Processing April 23, 2010 - Atlanta, GA, USA
technische universität Robotics Research Institute dortmund Outline Motivation System Model Resource Delegation Policy Evaluation Setup Results Conclusion and Future Work Alexander Fölling | April 23, 2010 2
technische universität Robotics Research Institute dortmund Motivation Distributed computing infrastructures (DCI) have reached production status More and more users draw its computing resources from Grid and Cloud infrastructures Many DCIs are exhaustively used and produce significant revenue Cloud-Infrastructures allow easy on-demand provisioning of resources (enlargement of local resource space) Infrastructure as a Service (IaaS) by virtualization technology Simple access and pricing model The temporal extension of the local resource space allows more flexible scheduling decisions Locally, no traditional parallel job scheduling problem with parallel machines (P m , R m , Q m - Model) On-demand resource leasing may improve scheduling performance Alexander Fölling | April 23, 2010 3
technische universität Robotics Research Institute dortmund System Model Alexander Fölling | April 23, 2010 4
technische universität Robotics Research Institute dortmund System Model Alexander Fölling | April 23, 2010 5
technische universität Robotics Research Institute dortmund System Model Resource Negotiation Workload- Workload- Forward Forward Analysis Analysis Jobs Jobs Alexander Fölling | April 23, 2010 6
technische universität Robotics Research Institute dortmund System Model Resource Negotiation Workload- Workload- Forward Forward Analysis Analysis Jobs Jobs Remote Access Remote Access Alexander Fölling | April 23, 2010 7
technische universität Robotics Research Institute dortmund Properties of Resource Delegation Different from centralized scheduling with multi-site execution No central scheduling component but independent sites Scheduler cedes full control to other schedulers when resource access is granted (for a certain period) Resource leasing enlarges the local resource space Scheduling decisions are exclusively made by local schedulers Resources might be used immediately or later during leasing period Advanced scheduling strategies may support both local allocations under varying machine sizes planning of future resource requirements Each participant in a DCI is both resource consumer and resource provider Alexander Fölling | April 23, 2010 8
technische universität Robotics Research Institute dortmund Submission Triggered Resource Delegation Policy (ST-RDP) Site 1 3 2 1 Queue Schedule Site 2 Schedule Alexander Fölling | April 23, 2010 9
technische universität Robotics Research Institute dortmund Submission Triggered Resource Delegation Policy (ST-RDP) Site 1 4 CPUs 100 Seconds Forward Job To LRMS 4 3 2 1 Queue Schedule Site 2 Schedule Alexander Fölling | April 23, 2010 10 10
technische universität Robotics Research Institute dortmund Submission Triggered Resource Delegation Policy (ST-RDP) Site 1 2 CPUs 4 CPUs idle 100 Seconds Forward Job To LRMS 4 3 2 1 Check Resource Availability Queue Schedule Site 2 Schedule Alexander Fölling | April 23, 2010 11 11
technische universität Robotics Research Institute dortmund Submission Triggered Resource Delegation Policy (ST-RDP) Site 1 2 CPUs 4 CPUs idle 100 Seconds Forward Job To LRMS 4 3 2 1 Check Resource Availability Queue Schedule Site 2 [Enough] Schedule Alexander Fölling | April 23, 2010 12 12
technische universität Robotics Research Institute dortmund Submission Triggered Resource Delegation Policy (ST-RDP) Site 1 2 CPUs 4 CPUs idle 100 Seconds Forward Job To LRMS 4 3 2 1 Check Resource Availability [Not Enough] Queue Schedule Try to Lend Site 2 Resource Deficiency [Enough] ? Request 2 CPUs for 100 seconds Schedule Alexander Fölling | April 23, 2010 13 13
technische universität Robotics Research Institute dortmund Submission Triggered Resource Delegation Policy (ST-RDP) Site 1 2 CPUs 4 CPUs idle 100 Seconds Forward Job To LRMS 4 3 2 1 Check Resource Availability [Not Enough] Queue Schedule Try to Lend Site 2 Resource Deficiency [Enough] ? Request 2 CPUs for 100 seconds [Request Denied] Schedule Alexander Fölling | April 23, 2010 14 14
technische universität Robotics Research Institute dortmund Submission Triggered Resource Delegation Policy (ST-RDP) Site 1 2 CPUs 4 CPUs idle 100 Seconds Forward Job To LRMS 4 3 2 1 Check Resource Availability [Not Enough] Queue Schedule Try to Lend Site 2 Resource Deficiency [Enough] [Request Accepted] Prioritize New Job Request 2 CPUs for 100 seconds [Request Denied] Schedule Alexander Fölling | April 23, 2010 15 15
technische universität Robotics Research Institute dortmund Submission Triggered Resource Delegation Policy (ST-RDP) Site 1 Forward Job To LRMS 3 2 1 Check Resource Availability 4a [Not Enough] Queue Schedule Try to Lend Site 2 Resource Deficiency [Enough] [Request Accepted] Prioritize New Job 4b [Request Denied] Schedule Alexander Fölling | April 23, 2010 16 16
technische universität Robotics Research Institute dortmund Evaluation Setup Input Data Real Workload Traces from Parallel Workloads Archive KTH, CTC, SDSC05 ~ 100 – 1600 CPUs, ~ 28000 – 74000 Jobs (first 11 months) Local Resource Management System EASY Backfilling Evaluation objectives for results Improvements in AWRT Reconfiguration behavior Alexander Fölling | April 23, 2010 17
technische universität Robotics Research Institute dortmund Results: ST-RDP Performance 25 n % ments in % 20 mproveme 15 KTH-11 CTC-11 10 AWRT imp SDSC05-11 5 AWR 0 Setup 1 Setup 2 Setup 3 Alexander Fölling | April 23, 2010 18
technische universität Robotics Research Institute dortmund Results: Reconfiguration behavior KTH and CTC 11 month with ST-RDP CPUs CPUs 400 500 300 400 200 300 100 200 2 2 4 6 8 10 4 6 8 10 Time in Time in month month KTH CTC Alexander Fölling | April 23, 2010 19
technische universität Robotics Research Institute dortmund Conclusion Proposed new concept for resource delegation in DCIs Parallel job scheduling problems under varying machine sizes The resource requirements can be flexibly negotiated among participants Evaluation of a simple resource delegation method Without need for further information exchange Robust in changing environments Results show significant benefits for the local scheduling (improvement in AWRT) During operation, many resources are delegated among sites Alexander Fölling | April 23, 2010 20
technische universität Robotics Research Institute dortmund Future Work Application to larger DCI environments Considering additional location policies that decides which site to ask first for delegation Long term planning of resource leasing/delegation Not only single job decisions Decisions should be based on workload records (user behavior, submission patterns etc.) Eventually, make decision on predicted user behavior Consider additional parameters like local queue/schedule status Alexander Fölling | April 23, 2010 21
technische universität Robotics Research Institute dortmund Thank You Alexander Fölling Joachim Lepping Christian Grimme alexander.foelling@udo.edu joachim.lepping@udo.edu christian.grimme@udo.edu Robotics Research Institute Information Technology Section TU Dortmund University, Germany http://www.it.irf.de Alexander Papaspyrou alexander.papaspyrou@udo.edu 22
Recommend
More recommend