Lottery and Stride Scheduling Flexible Proportional-Share Resource Management Carl A. Waldspurger Parallel Software Group MIT Laboratory for Computer Science August 25, 1995
Overview Context Framework Mechanisms Prototypes Diverse Resources Conclusions
Problem � multiplex scarce resources � concurrently executing clients Environment � service requests of varying importance � manage computation rates dynamically � enable flexible application-level policies Goals � promote software engineering principles
Related Work � operating systems � real-time systems Priority-Based Scheduling � fair-share � proportional-share Share-Based Scheduling � microeconomic � virtual clock, WFQ � AN2 switch Rate-Based Network Flow Control
Contributions � simple, powerful abstractions � modular resource management New Framework � randomized and deterministic algorithms � precise control over service rates Novel Mechanisms � proportional-share control � locks, memory, disk I/O Resource-Specific Techniques
Resource Management Framework � direct control over service rates � resource rights aggregate and vary smoothly Simple � powerful abstraction mechanism � insulate concurrent modules Modular � can express sophisticated policies � adapts to dynamic changes Flexible � general-purpose, scalable
Framework Abstractions � first-class objects � encapsulate resource rights Tickets � proportional throughput � inversely proportional response time � modular abstraction mechanism � name, share, protect sets of tickets Currencies � flexibly group or isolate sets of clients
Dynamic Management Techniques � explicit transfer between clients � useful when client blocks while waiting Ticket Transfers � example: synchronous IPC � clients create and destroy tickets � effects locally contained by currencies Ticket Inflation and Deflation � example: progress-based allocation
Example Currency Graph � currency: Computing Values � ticket: base sum value of 3000 backing tickets 1000 2000 base base � task2 funding in compute share of currency value alice bob � 100 Example 300 100 200 100 100 base units? � 2333 base units alice alice bob 300 1000 + 1 task3 1 100 task1 task2 task3 100 2000 1 1
Proportional-Share Mechanisms � lottery � multi-winner lottery Randomized � stride � hierarchical stride Deterministic � throughput accuracy � response-time variability Evaluation Criteria � algorithmic complexity
Lottery Scheduling Example total = 20 random [0..19] = 13 5 1 2 10 2 0 2 4 6 8 10 12 14 16 18 winner
Lottery Scheduling Analysis � simple, stateless algorithm � supports dynamic operations Strengths � randomization prevents cheating � guarantees are probabilistic p � poor short-term accuracy: O ( n a ) absolute error Weaknesses p 1 � high response-time variability: � =� = � p
Multi-Winner Lottery Example total = 20 #win = 4 random [0..19] = 13 total / #win = 5 10 2 5 1 2 0 2 4 6 8 10 12 14 16 18 winner winner winner winner #3 #4 #1 #2
Multi-Winner Lottery Analysis � improves accuracy for large clients t � guarantees b n c quanta per superquantum w T Strengths � bounds worst-case response time � improves list-based efficiency � probabilistic guarantees for small clients � dynamic operations terminate superquantum Weaknesses
Stride Scheduling Example � stride = 20 3 : 2 : 1 allocation � pass = stride Initialization � stride 1 = 6 15 stride1 tickets Pass Value 10 � choose client C strides: 2, 3, 6 � C.pass += C.stride Allocation 5 with minimum pass 0 0 5 10 Time (quanta)
Dynamic Stride Allocation Change 0 � tickets ! tickets 0 = � stride 0 Allocation Change 0 0 = stride � remain stride 0 = now + remain 0 � pass stride1 now pass tickets done remain stride remain remain’ now pass’ stride’ no updates needed for other clients
Stride Scheduling Analysis � strong deterministic guarantees � throughput error independent of n a Strengths � maximum relative error is one quantum � O ( n c ) absolute error � poor behavior for skewed ticket allocations Weaknesses
Hierarchical Stride Example 10 : 2 : 5 : 1 Ratio tickets Node � stride = stride pass 18 � pass = stride 10 20 Initialization � stride 1 = 180 stride1 12 6 tickets 15 45 30 30 � follow child C with 10 2 5 1 � C.pass += C.stride 18 54 90 90 36 36 180 180 Allocation smaller pass value
Hierarchical Stride Analysis � O (lg n c ) absolute error � reduces worst-case response-time variability Strengths � avoids worst-case stride scheduling behavior � can increase response-time variability � actual error can exceed stride scheduling error Weaknesses � complex dynamic operations
Throughput Accuracy Comparison Lottery 10 5 Static Allocation 0 Absolute Error (quanta) 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 � lottery Multi-Winner 10 13 : 7 : 3 : 1 Ratio � multi-winner (2,4,8) 5 0 � stride 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 Mechanisms 2 2 2 2 10 Stride 1 1 1 1 � hierarchical 0 0 0 0 5 0 20 40 0 20 40 0 20 40 0 20 40 0 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 Hierarchical 2 2 2 2 10 1 1 1 1 0 0 0 0 5 0 20 40 0 20 40 0 20 40 0 20 40 0 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 13 Tickets 7 Tickets 3 Tickets 1 Ticket Time (quanta)
Response-Time Comparison 1000000 100000 Lottery 10000 1000 Static Allocation 100 10 1 0 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 Frequency (log scale) � lottery Multi-Winner 1000000 100000 10000 13 : 7 : 3 : 1 Ratio 1000 � multi-winner (4) 100 10 1 0 � stride Mechanisms 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 1000000 100000 Stride 10000 � hierarchical 1000 100 10 1 0 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 1000000 Hierarchical 100000 10000 1000 100 10 1 0 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 13 Tickets 7 Tickets 3 Tickets 1 Ticket Response Time (quanta)
Prototype Process Schedulers � modified Mach microkernel � DECStation 5000/125 Lottery Scheduler � complete framework implementation � modified Linux kernel � IBM Thinkpad 350C Stride Scheduler � no ticket transfers or currencies Low System Overhead
Relative Rate Accuracy � Dhrystone � two tasks 15 Lottery Scheduler � three 60-second benchmark Observed Iteration Ratio 10 � arith benchmark runs for each ratio � two tasks � three 30-second Stride Scheduler 5 0 runs for each ratio 0 2 4 6 8 10 Ticket Ratio
Dynamic Ticket Deflation 100 stride scheduler Monte-Carlo 80 Cumulative Trials (thousands) simulations 60 many trials for accurate results 40 three tasks 20 funding based 0 0 500 1000 on relative error Time (sec)
Dynamic Ticket Transfers 40 lottery scheduler 30 query processing Queries Processed multithreaded 20 “database” server 10 three clients 8 : 3 : 1 allocation 0 0 200 400 600 800 Time (sec)
Modular Load Insulation lottery scheduler 8 currencies A, B 2 : 1 funding 6 A Iterations (millions) task A 4 funding 100.A B1 task B1 2 funding 100.B B2 task B2 joins with 0 0 100 200 300 Time (sec) funding 100.B
Managing Diverse Resources � locks, condition variables � ticket inheritance, repayment Synchronization Resources � inverse lotteries � minimum-funding revocation Space-Shared Resources Disk I/O Bandwidth Multiple Resources
Conclusions � direct application-level control � simple, modular, flexible General Framework � widely applicable � lottery and stride scheduling � efficient O (lg n c ) operations Proportional-Share Algorithms � techniques for locks, memory, disk
Future Directions � manage all critical resources � develop tools for adaptive software Multiple Resources � microeconomic vs. proportional-share � improve application responsiveness � GUI elements for resource management Human-Computer Interaction
Recommend
More recommend