Scalable Multi-Core Model Checking Alfons Laarman ( alfons@laarman.com ), Theory joint work with Jaco van de Pol and Tom van Dijk . 1 / 19 Scalable Multi-Core Model Checking
Multi-Core Model Checking Research questions • Can model checking scale (linearly, ideally) on modern multi-cores? 50 dfsfifo Speedup: ● garp ● giop2.nomig i−protocol2 40 S P = T seq / T P leader5 ● ● 30 Speedup Ideal: S P = P ● 20 Linear: ● 10 S P = P / c ● ● 0 0 10 20 30 40 50 Threads 2 / 19 Scalable Multi-Core Model Checking
Multi-Core Model Checking Research questions • Can model checking scale (linearly, ideally) on modern multi-cores? • Are our parallel solutions compatible with other techniques? 50 dfsfifo Speedup: ● garp ● giop2.nomig i−protocol2 40 • Compression techniques S P = T seq / T P leader5 ● ● 30 Speedup + • Symbolic exploration Ideal: S P = P ● 20 • Partial-order reduction Linear: ● 10 S P = P / c ● ● 0 0 10 20 30 40 50 Threads 2 / 19 Scalable Multi-Core Model Checking
Multi-Core Model Checking Research questions • Can model checking scale (linearly, ideally) on modern multi-cores? • Are our parallel solutions compatible with other techniques? 50 dfsfifo Speedup: ● garp ● giop2.nomig i−protocol2 40 • Compression techniques S P = T seq / T P leader5 ● ● 30 Speedup + • Symbolic exploration Ideal: S P = P ● 20 • Partial-order reduction Linear: ● 10 S P = P / c ● ● 0 0 10 20 30 40 50 Threads Related work • “compiler optimizations diminish the benefits of multi-core processing” [Holzmann 07] • “no silver bullet, that would solve all the scalability issues” [Barnat et al. 08] 2 / 19 Scalable Multi-Core Model Checking
Challenges Di ffi culties of parallelism • Steep memory hierarchies 3 / 19 Scalable Multi-Core Model Checking
Challenges Di ffi culties of parallelism • Steep memory hierarchies • Cache coherence protocol 3 / 19 Scalable Multi-Core Model Checking
Challenges Di ffi culties of parallelism • Steep memory hierarchies • Cache coherence protocol #define B (1024 � 1024 � 1024) int main ( void ) { int result = 0; for ( int i = 0; i < B; i++) result++; return result; } 3 / 19 Scalable Multi-Core Model Checking
Challenges Di ffi culties of parallelism • Steep memory hierarchies • Cache coherence protocol #define P 16 static void count ( void � arg) { int � counter = ( int � ) arg; for ( int i = 0; i < B / P; i++) ( � counter)++; } int main ( void ) { pthread t thread[P]; int counters[P] = 0; #define B (1024 � 1024 � 1024) for ( int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int main ( void ) { int result = 0; int result = 0; for ( int i = 0; i < B; i++) for ( int i = 0; i < P; i++) { result++; pthread join (thread[i], NULL); return result; result += counters[i]; } } return result; } 3 / 19 Scalable Multi-Core Model Checking
Challenges Di ffi culties of parallelism • Steep memory hierarchies • Cache coherence protocol #define P 16 static void count ( void � arg) { int � counter = ( int � ) arg; for ( int i = 0; i < B / P; i++) ( � counter)++; } int main ( void ) { pthread t thread[P]; int counters[P] = 0; #define B (1024 � 1024 � 1024) for ( int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int main ( void ) { int result = 0; int result = 0; for ( int i = 0; i < B; i++) for ( int i = 0; i < P; i++) { result++; pthread join (thread[i], NULL); return result; result += counters[i]; T 1 = 27 sec T 16 = 32 sec } } return result; } 3 / 19 Scalable Multi-Core Model Checking
Challenges Di ffi culties of parallelism • Steep memory hierarchies • Cache coherence protocol (false sharing) #define P 16 static void count ( void � arg) { int � counter = ( int � ) arg; for ( int i = 0; i < B / P; i++) ( � counter)++; } int main ( void ) { pthread t thread[P]; int attribute ((aligned(64))) counters[P] = 0; #define B (1024 � 1024 � 1024) for ( int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int main ( void ) { int result = 0; int result = 0; for ( int i = 0; i < B; i++) for ( int i = 0; i < P; i++) { result++; pthread join (thread[i], NULL); return result; result += counters[i]; T 1 = 27 sec T 16 = 32 sec } } return result; } 3 / 19 Scalable Multi-Core Model Checking
Challenges Di ffi culties of parallelism • Steep memory hierarchies • Cache coherence protocol (false sharing) • Fine-grained operations in model checking (e.g. no subsumption) #define P 16 static void count ( void � arg) { int � counter = ( int � ) arg; for ( int i = 0; i < B / P; i++) ( � counter)++; } int main ( void ) { pthread t thread[P]; int attribute ((aligned(64))) counters[P] = 0; #define B (1024 � 1024 � 1024) for ( int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int main ( void ) { int result = 0; int result = 0; for ( int i = 0; i < B; i++) for ( int i = 0; i < P; i++) { result++; pthread join (thread[i], NULL); return result; result += counters[i]; T 1 = 27 sec T 16 = 32 sec } } return result; T 16 = 1 . 8 sec } 3 / 19 Scalable Multi-Core Model Checking
◆ (Explicit-State) Model Checking global x = 7; global y = 3; 1 for ( int a = 1 .. 10) 1 int b = y + x; 2 x += y; 2 y = 2 � b; 4 / 19 Scalable Multi-Core Model Checking
◆ (Explicit-State) Model Checking global x = 7; global y = 3; 1 for ( int a = 1 .. 10) 1 int b = y + x; 2 x += y; 2 y = 2 � b; S : � x , y , a , b , pc 1 , pc 2 � s 0 = � 7 , 3 , 0 , 0 , 1 , 1 � next state ( � 7 , 3 , 0 , 0 , 1 , 1 � ) = {� 7 , 3 , 1 , 0 , 2 , 1 � , � 7 , 3 , 0 , 10 , 1 , 2 �} 4 / 19 Scalable Multi-Core Model Checking
(Explicit-State) Model Checking global x = 7; global y = 3; 1 for ( int a = 1 .. 10) 1 int b = y + x; 2 x += y; 2 y = 2 � b; S : � x , y , a , b , pc 1 , pc 2 � s 0 = � 7 , 3 , 0 , 0 , 1 , 1 � next state ( � 7 , 3 , 0 , 0 , 1 , 1 � ) = {� 7 , 3 , 1 , 0 , 2 , 1 � , � 7 , 3 , 0 , 10 , 1 , 2 �} Problem : Check all reachable states from s 0 ∈ S using next state ( S ) → 2 S with S = ◆ k (implicit-)graph search 4 / 19 Scalable Multi-Core Model Checking
(Explicit-State) Model Checking global x = 7; global y = 3; 1 for ( int a = 1 .. 10) 1 int b = y + x; 2 x += y; 2 y = 2 � b; S : � x , y , a , b , pc 1 , pc 2 � s 0 = � 7 , 3 , 0 , 0 , 1 , 1 � next state ( � 7 , 3 , 0 , 0 , 1 , 1 � ) = {� 7 , 3 , 1 , 0 , 2 , 1 � , � 7 , 3 , 0 , 10 , 1 , 2 �} Problem : Check all reachable states from s 0 ∈ S using next state ( S ) → 2 S with S = ◆ k (implicit-)graph search Basis for checking LTL/CTL and timed/probabilistic systems! 4 / 19 Scalable Multi-Core Model Checking
Overview 1. Reachability with Shared Hash Table 2. Tree Compression 3. Symbolic Reachability with Decision Diagrams 5 / 19 Scalable Multi-Core Model Checking
Static partitioning or shared hash table store store Worker 1 Worker 2 Queue Queue Queue Queue Worker 3 Worker 4 store store Static partitioning X On-the-fly (BFS) ± Scalability (queue contention) 6 / 19 Scalable Multi-Core Model Checking
Static partitioning or shared hash table store store Queue Queue Worker 1 Worker 2 Worker 1 Worker 2 Queue Queue Store Queue Queue Worker 4 Worker 3 Worker 3 Worker 4 Queue Queue store store Load balancer Static partitioning Shared hash table ✓ (Pseudo) DFS & BFS X On-the-fly (BFS) ? Scalability ± Scalability (queue contention) 6 / 19 Scalable Multi-Core Model Checking
Shared hash table procedure search ( p ) while balance ( Q ) do ⊲ with termination detection s := s ∈ Q p ; Q p := Q p \ s for all s ′ ∈ next state ( s ) do if s ′ � V then V := V ∪ { s ′ } Q p := Q p ∪ { s ′ } procedure reach ( s 0 , P ) V := { s 0 } Q 1 := { s 0 } search ( 1 ) � ... � search ( P ) 7 / 19 Scalable Multi-Core Model Checking
Shared hash table procedure search ( p ) while balance ( Q ) do ⊲ with termination detection s := s ∈ Q p ; Q p := Q p \ s for all s ′ ∈ next state ( s ) do if s ′ � V then � atomic V.find-or-put(s’) V := V ∪ { s ′ } Q p := Q p ∪ { s ′ } procedure reach ( s 0 , P ) V := { s 0 } Q 1 := { s 0 } search ( 1 ) � ... � search ( P ) 7 / 19 Scalable Multi-Core Model Checking
Lockless Hash Table: Design Laarman, van de Pol, Weber [fmcad10] Main bottlenecks • State store: concurrent access • Graph traversal: random memory access (bandwidth / latency) 8 / 19 Scalable Multi-Core Model Checking
Lockless Hash Table: Design Laarman, van de Pol, Weber [fmcad10] Main bottlenecks • State store: concurrent access • Graph traversal: random memory access (bandwidth / latency) Design • Open addressing 8 / 19 Scalable Multi-Core Model Checking
Lockless Hash Table: Design Laarman, van de Pol, Weber [fmcad10] Main bottlenecks • State store: concurrent access • Graph traversal: random memory access (bandwidth / latency) |state| |cache line| Design • Open addressing • Hash memoization • Walking the Line • In-situ locking bucket data 8 / 19 Scalable Multi-Core Model Checking
Recommend
More recommend