Sequential consistency considered harmful Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) 7 November 2017
The standard WMC talk . . . Sequential consistency (SC) CPU CPU ◮ Interleaving semantics read write ◮ Intuitive, isn’t it? Memory Weak memory consistency CPU CPU . . . ◮ All the SC behaviours write ◮ + Some weird behaviours . . . read ◮ Complicated. . . write-back Memory 2
Release-acquire (RA) is not complicated Message passing (MP) X = Y = 0 X := 1; a := Y ; / / 1 Y := 1 b := X / / � =0 Store buffering (SB) X := 1; Y := 1; a := Y ; / / 0 b := X ; / / 0 ◮ Messages delivered in order. ◮ But they take time to deliver! 3
What about the formal definition? But isn’t the definition RA more complex than of SC? ◮ It largely depends on presentation. . . But who cares about the MM definition? ◮ Theoreticians certainly care. ◮ Programmers do not understand programs by looking at the MM definition, but at its properties. Key question ◮ Determine whether our program is correct. ◮ We want automation : a tool to answer this query. ◮ We also want manual proof techniques . 4
Automated verification So, which model is best for automated verification? ◮ It depends on the exact question. Two cases where WMC verification is easier than under SC. 1. Checking consistency of an execution 2. Bounded model checking NB: There are other verification problems that are easy under SC and difficult under WM. 5
Checking consistency of an execution Execution consistency problem Given a concurrent program P with instructions of the form: ◮ x := v – write constant v to shared variable x ◮ r := x – read value of x into register r such that no two instructions have the same v or r , and a register assignment R = [ r 1 �→ v 1 , . . . r k �→ v k ], determine whether R is a possible outcome of P . This problem is: ◮ NP-complete for SC; ◮ Polynomial for several weak memory models (e.g., RA). 6
Stateless model checking ◮ For SC, 10+ years of research on optimisations. State of the art: Nidhugg � “optimal” DPOR. ◮ For RC11, our first attempt . . . to appear at POPL’18. Benchmark Nidhugg/SC RCMC/RC11 linuxrwlocks(2) 0.22 s 0.08 s linuxrwlocks(3) 37.65 s 7.58 s ms-queue(2) 0.45 s 0.13 s ms-queue(3) 21.21 s 4.34 s qspinlock(2) 0.11 s 0.06 s qspinlock(3) 15.78 s 3.34 s big0 18.26 s 2.65 s 7
Manual reasoning—program logics Weak memory enforces local reasoning. ◮ Ownership-based reasoning. ◮ Proof of a thread mentions only variables accessed by it. ◮ Key underlying principle of separation logic. RA allows causal reasoning. ◮ I have seen an update, so I have seen all previous updates. ◮ “Ownership transfer” in separation logic. SC, in addition, allows global reasoning. ◮ Proof of a thread can mention local variables of other threads. ◮ Global reason is complicated. 8
Message passing with the Owicki-Gries method (1976) Prove that the MP program cannot have its weak behaviour: � � Y = 0 X := 1; a := Y ; Y := 1 b := X � � a = 0 ∨ b = 1 9
Message passing with the Owicki-Gries method (1976) Prove that the MP program cannot have its weak behaviour: � � Y = 0 � � � � ⊤ Y = 0 ∨ X = 1 X := 1; a := Y ; � � � � X = 1 a = 0 ∨ X = 1 Y := 1 b := X � � � � a = 0 ∨ b = 1 ⊤ � � a = 0 ∨ b = 1 ◮ A straightforward local proof. ◮ Sound also under RA. 9
Store buffering with the Owicki-Gries method (1976) Prove that the SB program cannot have its weak behaviour: � � a � = 0 X := 1; Y := 1; a := Y b := X � � a � = 0 ∨ b � = 0 10
Store buffering with the Owicki-Gries method (1976) Prove that the SB program cannot have its weak behaviour: � � a � = 0 � � � � a � = 0 ⊤ X := 1; Y := 1; � � � � X � = 0 Y � = 0 a := Y b := X � � � � X � = 0 Y � = 0 ∧ ( a � = 0 ∨ b = X ) � � a � = 0 ∨ b � = 0 ◮ Requires a non-trivial global proof! ◮ This non-local reasoning is unsound under RA. 10
Program logics summary RA supports local and causal reasoning. ◮ Local Owicki-Gries is sound. ◮ Separation logic and extensions (RSL, GPS) are sound. Global reasoning is unsound under RA. ◮ It requires strong fences. ◮ Global reasoning is complicated to do anyway. ◮ Fences document when global reasoning is needed. 11
Scalability barrier: multi-copy atomicity? Independent reads of independent writes (IRIW) Initially, X = Y = 0 a := X ; c := Y ; / / 1 / / 1 X := 1 Y := 1 b := Y d := X / / / 0 / 0 ◮ Threads 2 and 3 observe the X := 1 and Y := 1 writes happen in different orders. 12
Summary Sequential consistency (SC) is bad . ◮ Thinking about interleavings is considered harmful. ◮ Multi-copy atomicity is fundammentally not scalable. ◮ And it also seems useless in practice. Release-acquire (RA) is good . ◮ Manual reasoning under RA is clearer. ◮ Fully supports local and causal reasoning. ◮ Fences document global reasoning. ◮ Automated reasoning under RA is easier. ◮ Checking consistency of an execution ◮ Bounded model checking 13
Recommend
More recommend