Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Solving Operating-Systems Problems with Probabilistic Model Checking Hendrik Tews Institute for Theoretical Computer Science Office at Operating systems group Resilience talk Mai 3, 2013 Hendrik Tews Probabilistic Model Checking Resilience 5/2013 1 / 44
Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Outline Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Hendrik Tews Probabilistic Model Checking Resilience 5/2013 2 / 44
Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Model checking functional system requirements abstract specification, e.g., model M M M Φ temporal formula Φ Φ model checker: M | does M | M | = Φ = Φ = Φ hold ? no + counterexample yes Hendrik Tews Probabilistic Model Checking Resilience 5/2013 3 / 44
Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Probabilistic model checking quantitative system requirements probabilistic specification, e.g., model M M M Φ temporal formula Φ Φ probabilistic model checker: M quantitative analysis of M M against Φ Φ Φ probability for “bad behaviors” is < 10 − 6 < 10 − 6 < 10 − 6 1 probability for “good behaviors” is 1 1 expected costs for .... Hendrik Tews Probabilistic Model Checking Resilience 5/2013 4 / 44
Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Outline Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Hendrik Tews Probabilistic Model Checking Resilience 5/2013 5 / 44
Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Spinlocks Problem ◮ n Processes on n CPU cores ◮ cooperate to protect a shared resource (OS-kernel ready-queue) ◮ Contention is rare, the lock is almost always free ◮ Inter-processor interrupts (IPI’s) are far too slow in this case Solution ◮ Synchronise over a shared lock variable ◮ change lock variable with atomic operations (CAS — compare and swap) ◮ expensive in the contention case Questions ◮ Does it scale to 100 cores? ◮ For which workloads? Hendrik Tews Probabilistic Model Checking Resilience 5/2013 6 / 44
Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Spinlocks Joint work with Christel Baier, Marcus Daum, Benjamin Engel, Hermann H¨ artig, Joachim Klein, Sascha Kl¨ uppelholz, Steffen M¨ arcker and Marcus V¨ olp FMICS 2012 Waiting for locks: How long does it usually take? , in: M. Stoelinga, R. Pinger (Eds.), 17th International Workshop on Formal Methods for Industrial Critical Systems, Vol. 7437 of Lecture Notes in Computer Science, Springer, 2012, pp. 47–62. SSV 2012 Chiefly symmetric: Results on the scalability of probabilistic model checking for operating-system code , in: F. Cassez, R. Huuck, G. Klein, B. Schlich (Eds.), Proceedings Seventh Conference on Systems Software Verification, Vol. 102 of EPTCS, 2012, pp. 156–166. Hendrik Tews Probabilistic Model Checking Resilience 5/2013 7 / 44
Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Test-And-Test-And-Set Lock volatile bool occupied = false; 1 volatile void lock(){ 2 while (atomic swap(occupied, true)){ 3 while (occupied){/* spin loop */} 4 } 5 } 6 void unlock(){ 7 occupied = false 8 } 9 ◮ model n processes that compete for the lock ◮ model lock as separate process ◮ compare results with measurements Hendrik Tews Probabilistic Model Checking Resilience 5/2013 8 / 44
Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Interesting properties In the long run... ◮ Probability for finding the lock free ◮ Probability for getting the lock twice in a row without waiting ◮ Average waiting time for the lock (under the condition that the lock busy) ◮ the 95% quantile of the waiting time quantile picture by Rene Schwarz from Wikimedia Commons Hendrik Tews Probabilistic Model Checking Resilience 5/2013 9 / 44
Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Process i : DTMC Model t i := random ( ν ) if t i > 0: start i ncrit i t i := t i − 1 if t i = 0: if t i = 0 t i := random ( ν ) if ¬ lock i : if t i > 0: t i := min ( t i + 1 , 2) t i := t i − 1 if lock i ∧ t i = 1: t i := random ( γ 0 ) if lock i ∧ t i = 2: t i := random ( γ 1 ) crit i wait i non-critical region ν Distributions: critical region (without spinning) γ 0 critical region (with spinning) γ 1 Hendrik Tews Probabilistic Model Checking Resilience 5/2013 10 / 44
Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion The lock: DTMC Model unlock if release i ∧ if release k ∧ ¬ wait 1 ∧ . . . ∧ ¬ wait n ¬ wait 1 ∧ . . . ∧ ¬ wait n if wait i if wait k if release i ∧ wait k . . . . . . lock i lock k if release k ∧ wait i perform uniform probabilistic choice for selecting next lock owner Hendrik Tews Probabilistic Model Checking Resilience 5/2013 11 / 44
Introduction Hendrik Tews probability Results: Probability to find the lock free 2[5][6][50,60] 0.75 0.85 0.95 Spinlock Case Study 0.7 0.8 0.9 2[5][6][40,50] 1 2[5][6][40,50,60] 2[5][6][40,50,60,70] cache aware model 2[5][6][40,60] 3[5][6][50,60] PWCS — probabilistic write-copy-select measured Probabilistic Model Checking 3[5][6][40,50] 3[5][6][40,50,60] 3[5][6][40,50,60,70] 3[5][6][40,60] 4[5][6][50,60] 4[5][6][40,50] 4[5][6][40,50,60] 4[5][6][40,50,60,70] Romain Case Study 4[5][6][40,60] Resilience 5/2013 Conclusion 12 / 44
Results: Average waiting time for spinning processes Introduction Hendrik Tews average waiting time 2[5][6][50,60] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Spinlock Case Study 2[5][6][40,50] 0 2[5][6][40,50,60] 2[5][6][40,50,60,70] 2[5][6][40,60] 3[5][6][50,60] PWCS — probabilistic write-copy-select Probabilistic Model Checking 3[5][6][40,50] 3[5][6][40,50,60] 3[5][6][40,50,60,70] cache aware model 3[5][6][40,60] 4[5][6][50,60] measured 4[5][6][40,50] 4[5][6][40,50,60] 4[5][6][40,50,60,70] Romain Case Study 4[5][6][40,60] Resilience 5/2013 Conclusion 13 / 44
Introduction Results: 95% quantile of the waiting time Hendrik Tews 95% quantile 2[5][6][50,60] 0.2 0.4 0.6 0.8 Spinlock Case Study 2[5][6][40,50] 0 1 2[5][6][40,50,60] 2[5][6][40,50,60,70] cache aware model 2[5][6][40,60] measured 3[5][6][50,60] PWCS — probabilistic write-copy-select Probabilistic Model Checking 3[5][6][40,50] 3[5][6][40,50,60] 3[5][6][40,50,60,70] 3[5][6][40,60] 4[5][6][50,60] 4[5][6][40,50] 4[5][6][40,50,60] 4[5][6][40,50,60,70] Romain Case Study 4[5][6][40,60] Resilience 5/2013 Conclusion 14 / 44
Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Scalability for PRISM, Distribution [40,50] model generation 20 20 steady state RAM 15 15 time in hours RAM in GB 10 10 5 5 0 0 3 4 5 number of processes 3 proc. 4,082,808 number of states: 4 proc. 198,808,720 Hendrik Tews Probabilistic Model Checking Resilience 5/2013 15 / 44
Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Symmetry reduction: Using a generic representative unlock ncrit Lock P 1 lock 1 lock x crit wait ncrit ncrit ncrit ncrit 1 ncrit 2 P P P P P x x x x x crit wait crit wait crit wait crit wait crit wait Hendrik Tews Probabilistic Model Checking Resilience 5/2013 16 / 44
Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Symmetry reduction: Using a generic representative unlock ncrit Lock P 1 lock 1 lock x crit wait ncrit ncrit ncrit ncrit 1 ncrit 2 P P P P P x x x x x crit wait crit wait crit wait crit wait crit wait crit : 1 ncrit 1 : 1 state counters: wait : 2 ncrit 2 : 1 Hendrik Tews Probabilistic Model Checking Resilience 5/2013 16 / 44
Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Results for symmetry-reduced model Non-critical Distribution [40, 50] 1 600 0.9 500 0.8 0.7 400 probability 0.6 time units 0.5 300 0.4 200 0.3 0.2 lock free probability 100 average waiting time 0.1 0 0 10 20 30 40 50 60 70 80 90 100 500 1000 5000 processes Hendrik Tews Probabilistic Model Checking Resilience 5/2013 17 / 44
Introduction Spinlock Case Study PWCS — probabilistic write-copy-select Romain Case Study Conclusion Scalability for symmetry-reduced model Non-critical Distribution [40, 50] 10 9 240 states (bisim. quot.) 10 8 run time 10 7 180 10 6 time (in min) 10 5 states 120 10 4 10 3 60 10 2 10 1 10 0 0 0 10 20 30 40 50 60 70 80 90 100 500 1000 5000 processes Hendrik Tews Probabilistic Model Checking Resilience 5/2013 18 / 44
Recommend
More recommend