Automated Repair of Concurrency Bugs Ben Liblit with Guoliang Jin and Shan Lu
We need reliable software People’s daily life now depends on reliable software Software companies spend lots of resources on debugging More than 50% effort on finding and fixing bugs Around $300 billion per year 2
Concurrency bugs hurt It is an increasingly parallel world Concurrency bugs in history 3
Multi-threaded program Concurrent programs under the shared-memory model Programs execute multiple interacting threads in parallel Threads communicate via shared memory Shared-memory accesses should be well-synchronized thread1 thread2 thread3 thread4 core1 core2 core3 core4 cache cache cache cache Multicore chip shared memory 4
An example concurrency bug The interleaving space Thread 1 Thread 2 if ( ptr != NULL) { Bad ptr ->field = 1; Thread 1 Thread 1 Thread 2 Thread 2 Huge } interleavings ptr = NULL; if ( ptr != NULL) { if ( ptr != NULL) { Interleaving ptr = NULL; ptr = NULL; ptr ->field = 1; ptr ->field = 1; space } } Segmentation Thread 1 Thread 2 Fault ptr = NULL; if ( ptr != NULL) { ptr ->field = 1; } Previous research focuses on finding 5
Bug fixing Software quality does not improve until bugs are fixed Manual concurrency bug fixing is time-consuming: 73 days on average error-prone: 39% patches are buggy in the first release CFix : automated concurrency- bug fixing [PLDI’11*, OSDI’12] *SIGPLAN: Program behaves correctly if bad interleavings do not occur Fix concurrency bugs by disabling bad interleavings “one of the first papers to attack the problem of automated bug fixing” 6
Automated fixing is difficult Description: Patch: ? Symptom Correctness Triggering condition Performance … Simplicity What is the correct behavior? Usually requires developers’ knowledge How to get the correct behavior? Correct program states under bug-triggering inputs No change to program states under other inputs 7
Automated concurrency-bug fixing? Description: Patch: ? Symptom Correctness Triggering condition Performance … Simplicity What is the correct behavior? The program state is already correct as long as the buggy interleaving does not occur How to get the correct behavior? Only need to disable failure-inducing interleavings Can leverage well-defined synchronization operations 8
Description: Description: Patch: How to get a ? general solution Interleavings that Symptom Correctness that generates Triggering condition lead to software Performance good patches? failure … Simplicity atomicity violation order violation detectors detectors p A r B ParkASPLOS’09, ZhangASPLOS’10, FlanaganPOPL’04, LuciaMICRO’09, c LuASPLOS’06, YuISCA’09, ChewEuroSys’10 GaoASPLOS’11 data race detectors abnormal data flow detectors SenPLDI’08, I 1 W b SavageTOCS’97, I 2 R ZhangASPLOS’11, YuSOSP’05, W g ShiOOPSLA’10 EricksonOSDI’10, KasikciASPLOS’10 9
Description: Patch: CFix Interleavings that Correctness lead to software Performance failure Simplicity Fix-Strategy Bug reports Source code Design Mutual exclusion Mutual exclusion Synchronization . . . Order Order Enforcement Patch Testing Patched binary Patched binary . . . Patched binary Patched binary & Selection Patch Selected binary . . . Selected binary Merging Run-time Merged binary Support Final patched binary 10
Contributions Show the feasibility of Fix-Strategy Design automated fixing for non- Synchronization deadlock concurrency bugs Enforcement Techniques that enforce Patch Testing mutual exclusion and order & Selection relationship Patch A framework that assembles Merging a set of techniques to Run-time automate the whole bug- Support fixing process: CFix 11
CFix: fix-strategy design Fix-Strategy Challenges: Design Huge variety of bugs Synchronization Enforcement Patch Testing & Selection Patch Merging Run-time Support 12
Two types of synchronization relationships Mutual Exclusion Order Relationship Why these two? Basic relationships can be achieved by typical synchronizations Based on real-world concurrency bug characteristics study 13
Fix-strategy for atomicity-violation detectors example 1 Thread 1 Thread 2 if ( ptr != NULL) { ptr = NULL; ptr ->field = 1; } 14
Fix-strategy for atomicity-violation detectors example 2 Thread 1 Thread 2 ptr ->field = 1; ptr = NULL; ptr ->field = 1; 15
CFix: fix-strategy design Fix-Strategy Challenges: Design Inaccurate root cause Synchronization Huge variety of bugs Enforcement Solution: Patch Testing & Selection A combination of Patch mutual exclusion & Merging order relationship Run-time enforcement Support 16
Fix-strategies AV Detector OV Detector Race Detector DU Detector p W b A I 1 B R r I 2 W g c 17
CFix: synchronization enforcement Fix-Strategy Challenges: Design Correctness, performance, Synchronization and simplicity Enforcement Solution: Patch Testing Mutual exclusion & Selection Patch enforcement: AFix [PLDI’11] Merging Order relationship Run-time enforcement: OFix [OSDI’12] Support 18
Mutual exclusion relationship Input: three statements ( p , c , r ) with contexts Thread 1 Thread 2 p if ( ptr != NULL) { r ptr = NULL; c ptr ->field = 1; } Goal: making the code region from p to c be mutually exclusive with r 19
Mutual exclusion enforcement: AFix Approach: lock p r c Principles: Correctly paired lock acquisition and release operations Small critical section 20
Put p and c into a critical section: naïve A naïve solution Add lock on edges reaching p Add unlock on edges leaving c Potential new bugs p p p p Could lock without unlock Could unlock without lock etc. c c c c 21
Put p and c into a critical section: AFix Assume p and c are in the same function f Step 1: find protected nodes in critical section Step 2: add lock operations p unprotected node protected node protected node unprotected node c Avoid those potential bugs mentioned 22
Subtle details p and c adjustment when they are in different functions Observation: people put lock and unlock in one function Find the longest common prefix of p ’s and c ’s stack traces Adjust p and c accordingly Put r into a critical section Do nothing if we can reach r from the p – c critical section Lock type: Lock with timeout: if critical section has blocking operations Reentrant lock: if recursion is possible within critical section 23
Order relationship Input: two statements (A, B) with contexts There could be multiple instances of A in one thread There could be multiple threads that could execute A There could be no instance of A during the whole execution Goal: making A execute before B 24
Order relationship: two sub-types A i … ? A B … A j A 1 A 1 initialization B B … use … destroy read A n A n firstA-B allA-B 25
OFix allA-B enforcement Approach: condition variable and flag Insert signal operations in A-threads Insert wait operation before B Principles A-thread signals exactly once when it will not execute more A A-thread signals as soon as possible B proceeds when each A-thread has signaled 26
OFix allA-B enforcement: A side How to identify the last A instance in one thread . . .; for (. . .) . . . ; // A A . . .; Each thread that executes A exactly once as soon as it can execute no more A 27
OFix allA-B enforcement: A side How to identify the last thread that executes A void main() { void thr_main() { for (. . .) for (. . .) counter for thread_create(thr_main); . . . ; // A signal threads . . .; . . .; } } void ofix_signal() { mutex_lock(L); =1 --; thread if ( == 0) A _create cond_broadcast(con); ++ mutex_unlock(L); } 28
OFix allA-B enforcement: B side Safe to execute only when is 0 void ofix_wait() { mutex_lock(L); if ( != 0) cond_timedwait(con, L, t); B mutex_unlock(L); } Give up if OFix knows that it introduces new deadlock Timed wait-operation to mask potential deadlocks 29
OFix firstA-B Basic enforcement A B When A may not execute Add a safety-net of signal with allA-B algorithm 30
CFix: patch testing & selection Fix-Strategy Design Synchronization Enforcement Patch Testing Challenge: & Selection Multi-thread software Patch testing Merging Solution: Run-time CFix-patch oriented testing Support 31
Patch testing principles Prune incorrect patches Patches causing failures due to wrong fix strategies, etc. Prune slow patches Prune complicated patches Not exhaustive testing, but patch oriented testing Leverage existing testing techniques, with extra heuristics 32
Run once without external perturbation Reject if there is a time-out or failure Patches fixing wrong root cause Make software to fail deterministically Thread 1 Thread 2 ptr ->field = 1; ptr = NULL; ptr ->field = 1; 33
Recommend
More recommend