concurrency bugs
play

Concurrency Bugs Ben Liblit with Guoliang Jin and Shan Lu We need - PowerPoint PPT Presentation

Automated Repair of Concurrency Bugs Ben Liblit with Guoliang Jin and Shan Lu We need reliable software Peoples daily life now depends on reliable software Software companies spend lots of resources on debugging More than 50% effort


  1. Automated Repair of Concurrency Bugs Ben Liblit with Guoliang Jin and Shan Lu

  2. We need reliable software  People’s daily life now depends on reliable software  Software companies spend lots of resources on debugging  More than 50% effort on finding and fixing bugs  Around $300 billion per year 2

  3. Concurrency bugs hurt  It is an increasingly parallel world  Concurrency bugs in history 3

  4. Multi-threaded program  Concurrent programs under the shared-memory model  Programs execute multiple interacting threads in parallel  Threads communicate via shared memory  Shared-memory accesses should be well-synchronized thread1 thread2 thread3 thread4 core1 core2 core3 core4 cache cache cache cache Multicore chip shared memory 4

  5. An example concurrency bug The interleaving space Thread 1 Thread 2 if ( ptr != NULL) { Bad ptr ->field = 1; Thread 1 Thread 1 Thread 2 Thread 2 Huge } interleavings ptr = NULL; if ( ptr != NULL) { if ( ptr != NULL) { Interleaving ptr = NULL; ptr = NULL; ptr ->field = 1; ptr ->field = 1; space } } Segmentation Thread 1 Thread 2 Fault ptr = NULL; if ( ptr != NULL) { ptr ->field = 1; } Previous research focuses on finding 5

  6. Bug fixing  Software quality does not improve until bugs are fixed  Manual concurrency bug fixing is  time-consuming: 73 days on average  error-prone: 39% patches are buggy in the first release  CFix : automated concurrency- bug fixing [PLDI’11*, OSDI’12] *SIGPLAN:  Program behaves correctly if bad interleavings do not occur  Fix concurrency bugs by disabling bad interleavings “one of the first papers to attack the problem of automated bug fixing” 6

  7. Automated fixing is difficult Description: Patch: ? Symptom Correctness Triggering condition Performance … Simplicity  What is the correct behavior?  Usually requires developers’ knowledge  How to get the correct behavior?  Correct program states under bug-triggering inputs  No change to program states under other inputs 7

  8. Automated concurrency-bug fixing? Description: Patch: ? Symptom Correctness Triggering condition Performance … Simplicity  What is the correct behavior?  The program state is already correct as long as the buggy interleaving does not occur  How to get the correct behavior?  Only need to disable failure-inducing interleavings  Can leverage well-defined synchronization operations 8

  9. Description: Description: Patch: How to get a ? general solution Interleavings that Symptom Correctness that generates Triggering condition lead to software Performance good patches? failure … Simplicity atomicity violation order violation detectors detectors p A r B ParkASPLOS’09, ZhangASPLOS’10, FlanaganPOPL’04, LuciaMICRO’09, c LuASPLOS’06, YuISCA’09, ChewEuroSys’10 GaoASPLOS’11 data race detectors abnormal data flow detectors SenPLDI’08, I 1 W b SavageTOCS’97, I 2 R ZhangASPLOS’11, YuSOSP’05, W g ShiOOPSLA’10 EricksonOSDI’10, KasikciASPLOS’10 9

  10. Description: Patch: CFix Interleavings that Correctness lead to software Performance failure Simplicity Fix-Strategy Bug reports Source code Design Mutual exclusion Mutual exclusion Synchronization . . . Order Order Enforcement Patch Testing Patched binary Patched binary . . . Patched binary Patched binary & Selection Patch Selected binary . . . Selected binary Merging Run-time Merged binary Support Final patched binary 10

  11. Contributions  Show the feasibility of Fix-Strategy Design automated fixing for non- Synchronization deadlock concurrency bugs Enforcement  Techniques that enforce Patch Testing mutual exclusion and order & Selection relationship Patch  A framework that assembles Merging a set of techniques to Run-time automate the whole bug- Support fixing process: CFix 11

  12. CFix: fix-strategy design Fix-Strategy Challenges: Design  Huge variety of bugs Synchronization Enforcement Patch Testing & Selection Patch Merging Run-time Support 12

  13. Two types of synchronization relationships Mutual Exclusion Order Relationship  Why these two?  Basic relationships can be achieved by typical synchronizations  Based on real-world concurrency bug characteristics study 13

  14. Fix-strategy for atomicity-violation detectors example 1 Thread 1 Thread 2 if ( ptr != NULL) { ptr = NULL; ptr ->field = 1; } 14

  15. Fix-strategy for atomicity-violation detectors example 2 Thread 1 Thread 2 ptr ->field = 1; ptr = NULL; ptr ->field = 1; 15

  16. CFix: fix-strategy design Fix-Strategy Challenges: Design  Inaccurate root cause Synchronization  Huge variety of bugs Enforcement Solution: Patch Testing & Selection  A combination of Patch mutual exclusion & Merging order relationship Run-time enforcement Support 16

  17. Fix-strategies AV Detector OV Detector Race Detector DU Detector p W b A I 1 B R r I 2 W g c 17

  18. CFix: synchronization enforcement Fix-Strategy Challenges: Design  Correctness, performance, Synchronization and simplicity Enforcement Solution: Patch Testing  Mutual exclusion & Selection Patch enforcement: AFix [PLDI’11] Merging  Order relationship Run-time enforcement: OFix [OSDI’12] Support 18

  19. Mutual exclusion relationship  Input: three statements ( p , c , r ) with contexts Thread 1 Thread 2 p if ( ptr != NULL) { r ptr = NULL; c ptr ->field = 1; }  Goal: making the code region from p to c be mutually exclusive with r 19

  20. Mutual exclusion enforcement: AFix  Approach: lock p r c  Principles:  Correctly paired lock acquisition and release operations  Small critical section 20

  21. Put p and c into a critical section: naïve  A naïve solution  Add lock on edges reaching p  Add unlock on edges leaving c  Potential new bugs p p p p  Could lock without unlock  Could unlock without lock  etc. c c c c 21

  22. Put p and c into a critical section: AFix  Assume p and c are in the same function f  Step 1: find protected nodes in critical section  Step 2: add lock operations p  unprotected node  protected node  protected node  unprotected node c  Avoid those potential bugs mentioned 22

  23. Subtle details  p and c adjustment when they are in different functions  Observation: people put lock and unlock in one function  Find the longest common prefix of p ’s and c ’s stack traces  Adjust p and c accordingly  Put r into a critical section  Do nothing if we can reach r from the p – c critical section  Lock type:  Lock with timeout: if critical section has blocking operations  Reentrant lock: if recursion is possible within critical section 23

  24. Order relationship  Input: two statements (A, B) with contexts  There could be multiple instances of A in one thread  There could be multiple threads that could execute A  There could be no instance of A during the whole execution  Goal: making A execute before B 24

  25. Order relationship: two sub-types A i … ? A B … A j A 1 A 1 initialization B B … use … destroy read A n A n firstA-B allA-B 25

  26. OFix allA-B enforcement  Approach: condition variable and flag  Insert signal operations in A-threads  Insert wait operation before B  Principles  A-thread signals exactly once when it will not execute more A  A-thread signals as soon as possible  B proceeds when each A-thread has signaled 26

  27. OFix allA-B enforcement: A side How to identify the last A instance in one thread . . .; for (. . .) . . . ; // A A . . .;  Each thread that executes A  exactly once as soon as it can execute no more A 27

  28. OFix allA-B enforcement: A side How to identify the last thread that executes A void main() { void thr_main() { for (. . .) for (. . .) counter for thread_create(thr_main); . . . ; // A signal threads . . .; . . .; } } void ofix_signal() { mutex_lock(L); =1 --; thread if ( == 0) A _create cond_broadcast(con); ++ mutex_unlock(L); } 28

  29. OFix allA-B enforcement: B side  Safe to execute only when is 0 void ofix_wait() { mutex_lock(L); if ( != 0) cond_timedwait(con, L, t); B mutex_unlock(L); }  Give up if OFix knows that it introduces new deadlock  Timed wait-operation to mask potential deadlocks 29

  30. OFix firstA-B  Basic enforcement A B  When A may not execute  Add a safety-net of signal with allA-B algorithm 30

  31. CFix: patch testing & selection Fix-Strategy Design Synchronization Enforcement Patch Testing Challenge: & Selection  Multi-thread software Patch testing Merging Solution: Run-time  CFix-patch oriented testing Support 31

  32. Patch testing principles  Prune incorrect patches  Patches causing failures due to wrong fix strategies, etc.  Prune slow patches  Prune complicated patches  Not exhaustive testing, but patch oriented testing  Leverage existing testing techniques, with extra heuristics 32

  33. Run once without external perturbation  Reject if there is a time-out or failure  Patches fixing wrong root cause  Make software to fail deterministically Thread 1 Thread 2 ptr ->field = 1; ptr = NULL; ptr ->field = 1; 33

Recommend


More recommend