From Static Scheduling Towards Understanding Uncertainty Andrei Tchernykh CICESE Research Center , Ensenada, Baja California, México chernykh@cicese.mx http://usuario.cicese.mx/~chernykh/ Algorithms and Scheduling Techniques to Manage Resilience and Power Consumption in Distributed Systems Dagstuhl – July 7, 2015
Baja California, México
Ensenada, Baja California, México
Sonora Yaqui deer dancer
Research Areas HPC Real Time Systems Grid Computing Resource optimization Scheduling Multiobjective online offline Optimization List Scheduling Stealing Computational Scheduling with Intelligence Service Levels Approximation Knowledge Free Algorithms Scheduling with Workflow Scheduling Uncertainty Cloud Computing
Collaboration Universidad Autónoma de Baja California Germany Universidad Autónoma de Nuevo León Mexico Tecnológico de Monterrey Instituto Tecnológico de Morelia Dortmund University Centro de Estudios Superiores del Estado Prof. Uwe Schwiegelshohn de Sonora University of Göttingen USA Prof. Ramin Yahyapour University of Notre Dame Luxembourg Dr. Jarek Nabrzyski University of California – Irvine, CA, USA University of Luxembourg Prof. Isaac Scherson, Prof. Pascal Bouvry Prof. Jean Luc Gaudiot Dr. Dzmitry Kliazovich Uruguay Universidad de la República Russia Dr. Sergio Nesmachnow Institute for System France Spain Programming , RAS Prof. Arutyun Avetisyan BSC Prof. Nikolay Kuzurin Prof. Vassil Alexandrov Institute of Informatics and Moscow Institute of Applied Mathematics of Grenoble Physics and Technology Prof. Denis Trystram Prof. Alexander Drozdov INRIA Lille - Nord Europe Prof. El-ghazali Talbi
Team CICESE Parallel Computing Laboratory 8
Towards Understanding Uncertainty in Cloud Computing Resource Provisioning Andrei Tchernykh CICESE Research Center, Mexico Uwe Schwiegelshohn University of Dortmund, Germany El-ghazali Talbi University of Lille, France Vassil Alexandrov Barcelona Supercomputing Centre, Spain ICCS-SPU 2015. Procedia Computer Science, Elsevier, 2015 CICESE Parallel Computing Laboratory 9
Uncertainty Can be classified in several different ways according to their nature: 1. Long-term uncertainty is due to the object is poorly understood and inadvertent factors can influence its behavior. 2. Retrospective uncertainty is due to the lack of information about the behavior of the object in the past. 3. Technical uncertainty is a consequence of the impossibility of predicting the exact results of decisions 4. Stochastic uncertainty is a result of probabilistic (stochastic) nature of the studied processes and phenomena. • there is a reliable statistical information; • statistical information is not available; • hypothesis on the stochastic nature requires verification. Tychinsky 2006 CICESE Parallel Computing Laboratory 10
Uncertainty 5. Constraint uncertainty - partial or complete ignorance of the conditions. 6. Participant uncertainty - conflict of main stakeholders: cloud providers, users and administrators. • own preferences, incomplete, inaccurate information about the motives and behavior of opposing parties. 7. Goal uncertainty • inability to select one goal • conflicts in building multi objective optimization model. • competing interests 8. Condition uncertainty occurs when a failure or a complete lack of information about the conditions under which decisions are made. CICESE Parallel Computing Laboratory 11
Uncertainty 9. Action uncertainty occurs when there is no ambiguity when choosing solutions. • Single objective case o determine the best solution among all feasible ones; • In multiple objective case, o there exists a (possibly infinite) number of Pareto optimal solutions. o There is the problem of finding a good element of this set . CICESE Parallel Computing Laboratory 12
Uncertainty Can be grouped into: parameter (parametric) uncertainties 1. arise from the incomplete knowledge and variation of the parameters 2. estimated using statistical techniques system uncertainties. 1. arise from an incomplete understanding of the processes that control service provisioning 2. incomplete information about a system CICESE Parallel Computing Laboratory 13
Uncertainty in Clouds Services and resources are subject to considerable uncertainty during provisioning. Uncertainty brings additional challenges to • End-users • Resource providers • Brokering It requires • waiving habitual computing paradigms • adapting current computing models • designing novel resource management strategies to handle uncertainty in an effective way The question is: How to deliver scalable and robust cloud behavior under uncertainties and specific constraints, such as budgets, QoS, SLA, energy costs; etc. CICESE Parallel Computing Laboratory 14
Sources of uncertainty • dynamic elasticity • dynamic performance changing • virtualization, loosely coupling application to the infrastructure • resource provisioning time variation • inaccuracy of application runtimes, variation of processing times • variation in data transmission, variable data streams, • release time and workload uncertainty • effective bandwidth variation, and other phenomenon. • workload is not predictable and can be changed dramatically • performance can be changed due to sharing of common resources with other VM CICESE Parallel Computing Laboratory 15
Sources of uncertainty Providers might not know the • Quantity of transmitted data • Amount of computation Example: Every time when a user requires a status of his e-mail or bank account, it could generate • different amount of data and • take different time for delivering. CICESE Parallel Computing Laboratory 16
Sources of uncertainty It is impossible to get exact knowledge about the system. Parameters such as • effective processor speed, • number of available processors, • actual bandwidth are changing over the time. Topology is unknown In general, an execution environment will differ for each program/service invocation . CICESE Parallel Computing Laboratory 17
Source of uncertainty Sources of uncertainty Resource provisioning Cost (dynamic pricing) Data (volume, variety, Resource availability Energy minimization Cloud infrastructure Communication Fault tolerance Consolidation Virtualization Jobs arrival Replication Scalability Migration Elasticity value) time ● ● ● ● ● ● ● ● ● ● ● ● Effective performance Cloud computing ● ● ● ● ● ● ● ● ● ● ● ● Effective bandwidth ● ● ● ● ● ● ● ● ● ● ● ● Processing time parameters ● ● ● ● ● ● ● ● ● ● ● ● Available memory ● ● ● ● ● ● ● ● ● ● Number of processors ● ● ● ● ● ● ● ● ● ● Available storage ● ● ● ● ● ● Data transfer time ● ● ● ● ● ● ● ● Resource capacity ● ● ● ● ● ● ● Network capacity CICESE Parallel Computing Laboratory 18
Approaches To treat uncertainly and dynamism we need sophisticated solutions. • Fuzzy, • Robust, • Non-clairvoyant • Knowledge-free • Stochastic • Randomized algorithms • Dynamic priority • Adaptive strategies (reactive) • Dynamic load balancing CICESE Parallel Computing Laboratory 19
Preliminary results
Scheduling for Cloud Computing with Different Service Levels Uwe Schwiegelshohn University of Dortmund, Germany Andrei Tchernykh CICESE Research Center, Mexico IPDPS 2012 , IEEE 26th International Parallel and Distributed Processing Symposium
Quality of Service Response time in relation to the requested processing time Deadline Service Level (slack factor) Execution time price per time unit Profit CICESE Parallel Computing Laboratory 22
Competitive Factor Obtained Income Competitive Optimal income Factor CICESE Parallel Computing Laboratory 23
Competitive Factor 𝒒 𝒏𝒋𝒐 𝟐 SSL-SM 𝝇 ≤ 𝟐 − (𝟐 − 𝒒 𝒏𝒃𝒚 ) Das Gupta and Palis, 2001 𝒈 𝒈 𝝇 ≤ 𝟐 + 𝒈(𝟐 − 𝒒 𝒏𝒋𝒐 Schwiegelshohn,Tchernykh 2012 SSL-MM 𝒒 𝒏𝒃𝒚 ) CICESE Parallel Computing Laboratory 24
Competitive Factor 𝒒 𝒏𝒋𝒐 𝒈 𝑱 − 𝟐 + 𝒒 𝒏𝒋𝒐 𝒒 𝒏𝒃𝒚 𝒒 𝒏𝒃𝒚 𝝇 ≤ 𝒏𝒃𝒚{ 𝒈 𝑱 − 𝟐 , MSL-SM 𝒈 𝑱 − 𝟐 + 𝒗 𝑱 𝒗 𝑱𝑱 𝝇 ≤ 𝒗 𝑱𝑱 (𝟐 − 𝟐 MSL-MM ) 𝒗 𝑱 𝒈 𝑱 Schwiegelshohn,Tchernykh 2012 CICESE Parallel Computing Laboratory 25
On-line Scheduling in Distributed Systems Multiple strip packing Job Stealing non-clairvoyant Uwe Schwiegelshohn University of Dortmund, Germany Andrei Tchernykh CICESE Research Center, Mexico Ramin Yahyapour University of Göttingen, Germany IEEE IPDPS 200ß
Grid Scheduling Algorithm Any machine applies a priority order when selecting jobs for execution: Jobs of its group A Jobs of its group B Jobs that are enabled for execution on its previous machine. CICESE Parallel Computing Laboratory 27
Recommend
More recommend