Resource and Application Resource and Application Models for Advanced Grid Models for Advanced Grid Schedulers Schedulers Aleksandar Lazarevic, Lionel Sacks Aleksandar Lazarevic, Lionel Sacks Dept. of Electrical and Electronic Engineering, Dept. of Electrical and Electronic Engineering, University College London University College London
Problem Problem � Heterogeneous and dispersed systems � Heterogeneous and dispersed systems � Quest for effective scheduling technique � Quest for effective scheduling technique � Good scheduling decisions depend on � Good scheduling decisions depend on quality and availability of information quality and availability of information � Importance of resource � Importance of resource- -efficient efficient information dissemination. information dissemination. v0.2 May- May -04 04
Motivation - - Scheduling Scheduling Motivation � Scheduling on distributed, heterogeneous � Scheduling on distributed, heterogeneous and dynamic Grid resources. and dynamic Grid resources. � Current Schedulers � Current Schedulers � Queuing or Batch: Queuing or Batch: � � NQE, PBS, LSF, Load � NQE, PBS, LSF, Load Leveler Leveler � Application Level: Application Level: � � AppLeS � AppLeS, MARS, SEA, DOME , MARS, SEA, DOME � Dynamic, Ranking: Dynamic, Ranking: � � Condor � Condor ClassAd ClassAd language and matchmaker language and matchmaker v0.2 May- May -04 04
Motivation – – Info Distribution Info Distribution Motivation � Current Globus approach � Current Globus approach - - centralized centralized LDAP information provider (MDS). LDAP information provider (MDS). � Little research in alternatives � Little research in alternatives – – MDS works MDS works for current size of Grid clusters. for current size of Grid clusters. � Centralized services are becoming a � Centralized services are becoming a bottleneck bottleneck � SMP or clusters as gateways to the Grid? � SMP or clusters as gateways to the Grid? v0.2 May- May -04 04
Bright Ideas - - Scheduling Scheduling Bright Ideas � Advance reservation and partitioning of � Advance reservation and partitioning of resources complex and wasteful. resources complex and wasteful. � Low � Low- -level scheduling in multitasking OS level scheduling in multitasking OS can distort machine loading info. can distort machine loading info. � Decouple application load and node � Decouple application load and node computational output computational output � Assign jobs based on requested � Assign jobs based on requested turnaround and unsubscribed capacity. turnaround and unsubscribed capacity. v0.2 May- May -04 04
Subscribed Load Scheduling Subscribed Load Scheduling CPU Usage [%] 100 Proc 2 CPU Proc 1 CPU Time Time 0 t Proc 1 Projected T/T @ t Unsubscribed @ t Proc 1 Estimated T/T Safety Mrg Proc 1 Requested T/T Time v0.2 May- May -04 04
Application & Node Profiles Application & Node Profiles � Distinction between volatile and non � Distinction between volatile and non- - volatile resources. volatile resources. � Profiles in XML with modular matchmaker. � Profiles in XML with modular matchmaker. � Nodes self asses the level of fitness for a � Nodes self asses the level of fitness for a given request and return a Bid Value. given request and return a Bid Value. � Monitoring and feedback improve � Monitoring and feedback improve confidence levels and reduce safety confidence levels and reduce safety margins margins v0.2 May- May -04 04
Bright Ideas - - Information Information Bright Ideas � Small � Small- -Worlds principle Worlds principle – – information information shared among several neighbours and few shared among several neighbours and few distant nodes. distant nodes. � Fuzzy picture of the Grid environment � Fuzzy picture of the Grid environment – – enables “ “good good” ” but not necessarily but not necessarily “ “best best” ” enables decisions. decisions. � Gaining credibility, good resilience to � Gaining credibility, good resilience to random node failures random node failures v0.2 May- May -04 04
Information Flows Information Flows � Localised, need � Localised, need- -to to- -know know MDS Policy Policy-based Management Repository & information flow policy information flow policy Distribution Accounting & Self-Organised Res. Management Management Management � 3 Discovery Resource � Security 3- -Tier Information Flow: Tier Information Flow: GSI - PKI Security Infrastructure � Node Current State Node Current State � Resource Discovery Control Low- Low -latency, short shelf life latency, short shelf life � Volatile Resources State Volatile Resources State � Integrity, Intelligence & Ganglia GRAM Self- -organized, distributed, fuzzy organized, distributed, fuzzy Self Monitoring Information [I 3 ] � Accounting Accounting � Local Job Manager (Fork, PBS) Centralized, reliable, accurate Centralized, reliable, accurate Operating System v0.2 May- May -04 04
Conclusions & Future Work Conclusions & Future Work � New approaches needed to handle � New approaches needed to handle dynamic and heterogeneous resource pool. dynamic and heterogeneous resource pool. � Reduce complexity and possible points of � Reduce complexity and possible points of failure. failure. � Develop a prototype meta � Develop a prototype meta- -scheduler and scheduler and test on 200 CPU UCL Grid test on 200 CPU UCL Grid v0.2 May- May -04 04
Recommend
More recommend