viktor yarmolenko rizos sakellariou
play

*Viktor Yarmolenko Rizos Sakellariou School of Computer Science - PowerPoint PPT Presentation

3rd International Workshop on Middleware for Grid Computing 28-29 November 2005 Grenoble France *Viktor Yarmolenko Rizos Sakellariou School of Computer Science The University of Manchester Manchester, UK *corresponding author:


  1. 3rd International Workshop on Middleware for Grid Computing 28-29 November 2005 Grenoble France *Viktor Yarmolenko Rizos Sakellariou School of Computer Science The University of Manchester Manchester, UK *corresponding author: Viktor.Yarmolenmko@manchester.ac.uk

  2. Introduction & What is Coming •WS-A Terms: Service Level Objectives & Business Value List •What are the usual terms for job submission? •Why WS-Agreement needs extending? •How do we wan it to be extended? •Simple scenarios to demonstrate extended WS-Agreement at work •Simulation model used to prove the point •What do the results say? Service Level Agreement (SLA) is nothing more than a contract between two or more parties WS-Agreement is one of the implementations of SLA

  3. The Usual Suspects – SLO&BVL t D T F T S N CPU time SLO: T S – the earliest time the Job is allowed to start SLO: T F – the latest time the Job is allowed to finish SLO: N CPU – number of CPU nodes required for the Job t D – projected Job duration time for N CPU nodes SLO: SLO: t UP – uniprocessor Job duration time (CPU-hours) SLO: B job – projected traffic that Job creates BVL: V pr – the price for executing the Job BVL: V pn – the penalty for failing the Job BVL: V tot – final value of the agreement (optional)

  4. More Flexibility!!! x y z h Δ π α β A list of universal variables A list of predefined common functions Possibility to describe agreement terms as functions

  5. Universal Terms – Useful Variables & Functions UT: t curr – current wall clock time UT: B RES (t) – Resource bandwidth: nominal or @ time UT: R ld (t curr ) – Resource load @ time: current or any other UT: t S – actual Job execution start time UT: t DA – actual Job duration time UT: B JA (t S , t DA ) – actual bandwidth used by the Job UT: d(n)= n+(n-1)+…+2+1 – triangular numbers UT: f norm (t,low,high) – binary function UT: f tr (t ,low,a ,high ,ß) – trapezium

  6. Variable Number of CPUs per Job CPU N CPU = 12 t D = 2 N CPU = 8 t D = 3 SLO: N CPU = {2,3,4,..} N CPU = 7 t UP t D = SLO: t D = 3.43 N CPU N CPU = 6 SLO: t UP = 24 N CPU = 2 t D = 12 t D = 4 N CPU = 3 t D = 8 SLO: X other = const N CPU = 4 t D = 6 Time

  7. Adding Variable Bandwidth and Traffic CPU N CPU = 12 t D = 2 UT: t curr N CPU = {2,3,4,..} N CPU = 8 t D = 3 SLO: N CPU = 7 SLO: t UNIPROC = 24 UT: B RES (t curr ) t D = 3.43 N CPU = 6 t D = 4 UT: d(n)= n+(n-1)+…+2+1 N CPU = 2 t D = 12 N CPU = 3 t D = 8 N CPU = 4 t D = 6 Time SLO: B job = B 0 d(N CPU – 1) B job t UP B 0 t UP (N CPU – 1) t D = B RES N CPU = SLO: 2B RES CPU#1 CPU#6 CPU#2 SLO: X other = const For All-to-All topology CPU#5 CPU#3 CPU#4

  8. Adding Variable Bandwidth and Traffic CPU N CPU = 12 t D = 2 UT: t curr N CPU = {2,3,4,..} N CPU = 8 t D = 3 SLO: N CPU = 7 SLO: t UNIPROC = 24 UT: B RES (t curr ) t D = 3.43 N CPU = 6 t D = 4 N CPU = 2 t D = 12 N CPU = 3 t D = 8 N CPU = 4 t D = 6 Time SLO: B job = B 0 (N CPU – 1) B job t UP B 0 t UP (N CPU – 1) t D = B RES N CPU = SLO: N CPU B RES CPU#1 CPU#6 CPU#2 SLO: X other = const For Pipe topology CPU#5 CPU#3 CPU#4

  9. Comparing the Impact of Two Topologies CPU#1 2.5 CPU#6 CPU#2 2.0 Dependence on N CPU All-to-All Topology 1.5 CPU#5 CPU#3 CPU#4 1.0 Pipe Topology CPU#1 CPU#6 CPU#2 0.5 0.0 1 2 3 4 5 6 Duration of the Job, ~t D CPU#5 CPU#3 CPU#4

  10. Defining the Value of the Service f tr (a) 1 UT: t curr Building V tot function UT: B RES (t curr ) f ld (b) UT: R ld (t curr ) = f ld SLO: B job ' f tr (c) B job t UP max V pr t D = B RES N CPU SLO: max V pn SLO: X other = const ' (d) f tr f ld BVL: V tot = f(R ld , t s, N CPU , …) (t s +t D ) t s Time, t

  11. Suddenly life becomes more interesting

  12. The Model Single & Multiple Negotiations User Resource Set of 340 Job requests, for which Capacity of 64 CPUs and a solution exists where the 100% available for 147 hours utilisation is possible on Resource Scheduling by the earliest (147 hours x 64 CPUs) deadline first t D × N CPU = A; ‹A› = 21.85 (single iteration)

  13. Variable CPU Scenario (Original vs. Extended SLA) t = 0 How about: N CPU =6; t D =4; … No can do  Then how about: N CPU =4; t D =6; … Time No can do  User Resource Then how about: N CPU =2; t D =12; … t → ∞ Will do  t = 0 How about: t D = f(N CPU ); … Will do  User Resource t ‚ ∞

  14. Only Single Negotiation is Allowed 100 Rate of Rejected Jobs: 90 using normal SLA using extended SLA 80 The Persentage of Rejected Jobs, % 70 60 50 40 30 20 10 0 84 86 88 90 92 94 96 98 100 The Persentage of Processed Jobs, %

  15. Multiple Negotiations Allowed 5.0 using normal SLA 4.5 using extended SLA The Average Number of Negotiations per Job 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 84 86 88 90 92 94 96 98 100 The Persentage of Processed Jobs, %

  16. Was it all worth it? •Reduction in traffic associated with negotiation of Resource •Reduction in user-service interaction •Extended Agreement gives more power to resource allocation, scheduling, management, aggregation of services •Extended Agreement is extensible and could support future demands, e.g. new optimisation algorithms, value added services, autonomous services, …

Recommend


More recommend