Detecting Concurrency Errors of Erlang Programs Using Systematic Testing Kostis Sagonas kostis@it.uu.se
Outline � Erlang � Concurrency errors and their sources � Systematic Concurrency Testing � Concuerror & Demo � Techniques to fight combinatorial explosion � Some experiences Dagstuhl 2017 2017-02-01
Erlang � Concurrent functional language � Implements the actor model of concurrency • lightweight processes (“green threads”) • asynchronous message passing • selective receive � Conceptually no shared memory but • various built-ins that manipulate shared memory Dagstuhl 2017 2017-02-01
Concurrent programming is HARD � Concurrent execution is difficult to reason about and get right (even for experts!) � Rare process interleaving results in bugs that are � hard to anticipate � difficult to find, reproduce, and debug (“Heisenbugs”) � hard to be sure whether they are really fixed � Big productivity problem: it can waste significant developers’ time and resources � Can have severe consequences Dagstuhl 2017 2017-02-01
Systematic Concurrency Testing aka Stateless Model Checking � Technique to find concurrency errors or verify their absence � by exploring all possible ways that concurrent execution can influence a program’s outcome � Fully automatic � Low memory requirements � Applicable to programs with finite executions Dagstuhl 2017 2017-02-01
Sources of non-determinism � Scheduling non-determinism � Interleaving non-determinism � Processes can race to access shared resources � Processes can be preempted at arbitrary points � Timing non-determinism � Sleeping processes can wake up at any point � Timers can fire in arbitrary points/orders � Memory model effects � Input/data non-determinism � Programs can be used in a variety of ways � Non-deterministic calls (e.g. random() ) Dagstuhl 2017 2017-02-01
Scheduling non-determinism x := 1; x := 1; x := 2; x := 2; y := 1; y := 1; y := 2; y := 2; 0,0 0,0 1,0 1,0 2,0 2,0 x := 1; x := 1; 1,1 1,1 2,0 2,0 1,0 1,0 2,2 2,2 y := 1; y := 1; 2,1 2,1 2,1 2,1 2,2 2,2 1,1 1,1 1,2 1,2 1,2 1,2 x := 2; x := 2; y := 2; y := 2; 2,2 2,2 1,2 1,2 1,1 1,1 2,2 2,2 2,1 2,1 1,1 1,1 Dagstuhl 2017 2017-02-01
Concuerror � A SCT tool for Erlang programs � Given a program and its test suite • systematically explores process interleaving and • presents detailed interleaving information about errors during the execution of these tests � Errors detected • Process crashes and abnormal termination • Assertion violations • “Deadlocks”: lack of progress for processes Dagstuhl 2017 2017-02-01
2017-02-01 www.concuerror.com Dagstuhl 2017
Concuerror's properties � Easy to use � Scalable � Applicable to “real-world” programs � Precise � Any error found is possible to occur � Does not introduce new behaviors � Coverage � All concurrency errors (for a test) can be found � Captures all scheduling non-determinism � Exhaustively explores this non-determinism Dagstuhl 2017 2017-02-01
Erlang program and its unit test -module(ping_pong). -export([pong/0]). pong() -> Self = self(), Pid = spawn(fun() -> ping(Self) end), register( ping_pong , Pid), receive ping -> ok end. ping(P) -> P ! ping . -module(ping_pong_test). -export([test/0]). pong_test() -> ok = ping_pong:pong(). Dagstuhl 2017 2017-02-01
Error discovered by Concuerror Checked 5 interleaving( s) . 1 error found. Error type : Exception Details : {badarg,[{erlang,register,[ping_pong,<...>],[]}, ... Process P1 spawns process P1.1 Process P1.1 sends message `ping` to process P1 Process P1.1 exits (normal) Process P1 registers process P1.1 (dead) as `ping_pong` Process P1 exits ("Exception") Dagstuhl 2017 2017-02-01
Another Erlang program -module(identity_theft). -module(identity_theft). -export([action/0]). -export([action/0]). action() -> action() -> Bank = self(), Bank = self(), register(bank, Bank), register(bank, Bank), Cust = spawn(fun() -> bank ! money end), Cust = spawn(fun() -> bank ! money end), God = spawn(fun() -> receive Msg -> ok end end), God = spawn(fun() -> receive Msg -> ok end end), Thief = spawn(fun() -> unregister(bank), register(bank, self()), receive money -> God ! thief_got_money after 0 -> God ! theft_failed end end), receive money -> God ! bank_got_money end. receive money -> God ! bank_got_money end. Dagstuhl 2017 2017-02-01
Interleaving Explosion � Combinatorial explosion in the number of interleavings Initially: x = y = … = z = 0 Thread 1: Thread 2: Thread N: … x := 1 y := 1 z := 1 - Interleavings under naïve exploration: N! - Interleavings needed to cover all behaviors: 1 Partial Order Reduction (POR) � Explore just a subset of all interleavings � Still cover all behaviors Dagstuhl 2017 2017-02-01
Partial Order Reduction Thread 1: Thread 2: x= 0 y= 0 x := 1 y := 1 Thread 1: Thread 2: x := 1 y:= 1 x= 1 x= 0 y= 0 y= 1 Thread 2: Thread 1: y := 1 x := 1 x= 1 y= 1 The order of independent events does not matter Dagstuhl 2017 2017-02-01
Dynamic Partial Order Reduction 1. Run the program, recording events that affect shared data as they happen 2. Determine pairs of conflicting events 3. Find suitable rescheduling points 4. Backtrack Dagstuhl 2017 2017-02-01
New DPOR Algorithms � Source DPOR • Identical to “Classic” DPOR with one test replaced • Significantly less explored traces and time � Optimal DPOR • Achieves optimality with Wakeup Trees � Completely prevent exploration of traces that DPOR algorithms will eventually discover as redundant • Memory overhead is very reasonable in practice Dagstuhl 2017 2017-02-01
Classic POPL ‘05 Traces explored Time Benchmark Classic Source Optimal Classic Source Optimal filesystem (14) 4 2 2 0.54s 0.36s 0.35s filesystem (16) 64 8 8 8.13s 1.82s 1.78s 1024 32 32 2m11s 8.52s 8.86s filesystem (18) 8m33s 18.62s 19.57s filesystem (19) 4096 64 64 Our work Our work Dagstuhl 2017 2017-02-01
Optimal POPL ‘14 Traces explored Time Benchmark Classic Source Optimal Classic Source Optimal indexer (12) 78 8 8 0.74s 0.11s 0.10s 56m20s 50.24s 52.35s indexer (15) 341832 4096 4096 Dagstuhl 2017 2017-02-01
Evaluation: Real programs Traces explored Time Benchmark Classic Source Optimal Classic Source Optimal dialyzer 12436 3600 3600 14m46s 5m17s 5m46s gproc 14080 8328 8104 3m3s 1m45s 1m57s poolboy 6018 3120 2680 3m2s 1m28s 1m20s rushhour 793375 536118 528984 145m 102m 106m LOC: 44596 (dialyzer), 9446 (gproc), 79732 (poolboy), 917 (rushhour) Dagstuhl 2017 2017-02-01
Recommend
More recommend