datacollider effective data race detection for the kernel
play

DataCollider: Effective Data-Race Detection for the Kernel John - PowerPoint PPT Presentation

DataCollider: Effective Data-Race Detection for the Kernel John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, Kirk Olynyk Microsoft Windows and Microsoft Research {jerick, madanm, sburckha, kirko}@microsoft.com "Although threads seem


  1. DataCollider: Effective Data-Race Detection for the Kernel John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, Kirk Olynyk Microsoft Windows and Microsoft Research {jerick, madanm, sburckha, kirko}@microsoft.com "Although threads seem to be a small step from sequential computation, in fact, they represent a huge step. They discard the most essential and appealing properties of sequential computation: understandability, predictability, and determinism." — From “The Problem with Threads,” by Edward A. Lee, IEEE Computer , vol. 25, no. 5, May 2006

  2. Windows case study #1 Thread A Thread B RestartCtxtCallback(...) RunContext(...) Thread B { { pctxt->dwfCtxt |= pctxt->dwfCtxt &= CTXTF_NEED_CALLBACK; ~CTXTF_RUNNING; } } • The OR’ing in of the CTXTF_NEED_CALLBACK flag can be swallowed by the AND’ing out of the CTXTF_RUNNING flag! • Results in system hang.

  3. Case study #1, assembled Thread A Thread B mov eax, [pctxt->dwfCtxt] mov eax, [pctxt->dwfCtxt] and eax, NOT 10h or eax, 20h mov [pctxt->dwfCtxt], eax mov [pctxt->dwfCtxt], eax EAX = ?? EAX = ?? pctxt->dwfCtxt = 11h

  4. Case study #1, assembled Thread A Thread B mov eax, [pctxt->dwfCtxt] mov eax, [pctxt->dwfCtxt] 1 and eax, NOT 10h or eax, 20h mov [pctxt->dwfCtxt], eax mov [pctxt->dwfCtxt], eax EAX = 11h EAX = ?? pctxt->dwfCtxt = 11h

  5. Case study #1, assembled Thread A Thread B mov eax, [pctxt->dwfCtxt] mov eax, [pctxt->dwfCtxt] 1 2 and eax, NOT 10h or eax, 20h mov [pctxt->dwfCtxt], eax mov [pctxt->dwfCtxt], eax EAX = 01h EAX = ?? pctxt->dwfCtxt = 11h

  6. Case study #1, assembled Thread A Thread B mov eax, [pctxt->dwfCtxt] mov eax, [pctxt->dwfCtxt] 1 2 and eax, NOT 10h or eax, 20h /* CONTEXT SWITCH */ mov [pctxt->dwfCtxt], eax mov [pctxt->dwfCtxt], eax EAX = 01h EAX = ?? pctxt->dwfCtxt = 11h

  7. Case study #1, assembled Thread A Thread B mov eax, [pctxt->dwfCtxt] mov eax, [pctxt->dwfCtxt] 1 3 2 and eax, NOT 10h or eax, 20h /* CONTEXT SWITCH */ mov [pctxt->dwfCtxt], eax mov [pctxt->dwfCtxt], eax EAX = 01h EAX = 11h pctxt->dwfCtxt = 11h

  8. Case study #1, assembled Thread A Thread B mov eax, [pctxt->dwfCtxt] mov eax, [pctxt->dwfCtxt] 1 3 2 and eax, NOT 10h or eax, 20h 4 /* CONTEXT SWITCH */ mov [pctxt->dwfCtxt], eax mov [pctxt->dwfCtxt], eax EAX = 01h EAX = 31h pctxt->dwfCtxt = 11h

  9. Case study #1, assembled Thread A Thread B mov eax, [pctxt->dwfCtxt] mov eax, [pctxt->dwfCtxt] 1 3 2 and eax, NOT 10h or eax, 20h 4 /* CONTEXT SWITCH */ mov [pctxt->dwfCtxt], eax mov [pctxt->dwfCtxt], eax 5 EAX = 01h EAX = 31h pctxt->dwfCtxt = 31h

  10. Case study #1, assembled Thread A Thread B mov eax, [pctxt->dwfCtxt] mov eax, [pctxt->dwfCtxt] 1 3 2 and eax, NOT 10h or eax, 20h 4 /* CONTEXT SWITCH */ mov [pctxt->dwfCtxt], eax mov [pctxt->dwfCtxt], eax 6 5 EAX = 01h EAX = 31h pctxt->dwfCtxt = 01h

  11. Case study #1, assembled Thread A Thread B mov eax, [pctxt->dwfCtxt] mov eax, [pctxt->dwfCtxt] 1 3 2 and eax, NOT 10h or eax, 20h 4 CTXTF_NEED_CALLBACK disappeared! /* CONTEXT SWITCH */ mov [pctxt->dwfCtxt], eax mov [pctxt->dwfCtxt], eax 6 5 (pctxt->dwfCtxt & 0x20 == 0) EAX = 01h EAX = 31h pctxt->dwfCtxt = 01h

  12. Windows case study #1 Thread A Thread B RestartCtxtCallback(...) RunContext(...) Thread B { { pctxt->dwfCtxt |= pctxt->dwfCtxt &= CTXTF_NEED_CALLBACK; ~CTXTF_RUNNING; or [ecx+40], 20h and [ecx+40], ~10h } } • Instructions appear atomic, but they are not!

  13. Data race definition  By our definition, a data race is a pair of memory accesses that satisfy all the below:  The accesses can happen concurrently  There is a non-zero overlap in the physical address ranges specified by the two accesses  At least one access modifies the contents of the memory location

  14. Importance  Very hard to reproduce  Timings can be very tight  Hard to debug  Very easy to mistake as a hardware error “bit flip”  To support scalability, code is moving away from monolithic locks  Fine-grained locks  Lock-free approaches

  15. Previous Techniques  Happens-before and lockset algorithms have significant overhead  Intel Thread Checker has 200x overhead  Log all synchronizations  Instrument all memory accesses  High overhead can prevent usage in the field  Causes false failures due to timeouts

  16. Challenges  Prior schemes require a complete knowledge and logging of all locking semantics  Locking semantics in kernel-mode can be homegrown, complicated and convoluted.  e.g. DPCs, interrupts, affinities

  17. DataCollider: Goals

  18. DataCollider: Goals 1. No false data races  Tradeoff between having false positives and reporting fewer data races

  19. False vs. Benign  False data race  A data race that cannot actually occur  Benign data race  A data race that can and does occur, but is intended to happen as part of normal program execution

  20. False vs. benign example Thread B Thread A MyLockAcquire(); MyLockAcquire(); gReferenceCount++; gReferenceCount++; MyLockRelease(); MyLockRelease(); gStatisticsCount++; gStatisticsCount++;

  21. False vs. Benign  False data race  A data race that cannot actually occur  Benign data race  A data race that can and does occur, but is intended to happen as part of normal program execution

  22. False vs. benign example Thread B Thread A MyLockAcquire(); MyLockAcquire(); gReferenceCount++; gReferenceCount++; MyLockRelease(); MyLockRelease(); gStatisticsCount++; gStatisticsCount++;

  23. DataCollider: Goals 2. User-controlled overhead  Give user full control of overhead – from 0.0x up  Fast vs. more races found

  24. DataCollider: Goals 3. Actionable data  Contextual information is key to analysis and debugging

  25. Insights

  26. Insights 1. Instead of inferring if a data race could have occurred, let’s cause it to actually happen!  No locksets, no happens-before

  27. Insights 2. Sample memory accesses  No binary instrumentation  No synchronization logging  No memory access logging  Use code and data breakpoints  Randomly selection for uniform coverage

  28. Intersection Metaphor

  29. Intersection Metaphor Memory Address = 0x1000

  30. Intersection Metaphor Memory Address = 0x1000 Hi, I’m Thread A!

  31. Intersection Metaphor Instruction stream Memory Address = 0x1000

  32. Intersection Metaphor Instruction stream I have the lock, so I get a green light. Memory Address = 0x1000

  33. Intersection Metaphor Instruction stream Memory Address = 0x1000

  34. Intersection Metaphor Memory Address = 0x1000 DataCollider

  35. Intersection Metaphor Memory Address = 0x1000 DataCollider

  36. Intersection Metaphor Please wait a moment, Thread A – we’re doing a routine check for data races. Memory Address = 0x1000 DataCollider

  37. Intersection Metaphor Memory Address = 0x1000 Value = 3 Data Breakpoint DataCollider

  38. Intersection Metaphor Memory Address = 0x1000 Value = 3 Data Breakpoint DataCollider

  39. Intersection Metaphor: Normal Case

  40. Intersection Metaphor: Normal Case Memory Address = 0x1000 Value = 3 Data Breakpoint DataCollider

  41. Intersection Metaphor: Normal Case Thread B Memory Address = 0x1000 Value = 3 Data Breakpoint DataCollider

  42. Intersection Metaphor: Normal Case I don’t’ have the lock, so I’ll have to wait. Memory Address = 0x1000 Value = 3 Data Breakpoint DataCollider

  43. Intersection Metaphor: Normal Case Nothing to see here. Let me remove this trap. Memory Address = 0x1000 Value = 3 Data Breakpoint DataCollider

  44. Intersection Metaphor: Normal Case Looks safe now. Sorry for the inconvenience. DataCollider

  45. Intersection Metaphor: Normal Case

  46. Intersection Metaphor: Data Race

  47. Intersection Metaphor: Data Race Memory Address = 0x1000 Value = 3 Data Breakpoint DataCollider

  48. Intersection Metaphor: Data Race Thread B Memory Address = 0x1000 Value = 3 Data Breakpoint DataCollider

  49. Intersection Metaphor: Data Race Locks are for wimps! Memory Address = 0x1000 Value = 3 Data Breakpoint DataCollider

  50. Intersection Metaphor: Data Race DataCollider

  51. Intersection Metaphor: Data Race

  52. Intersection Metaphor: Data Race DataCollider

  53. Intersection Metaphor: Data Race Looks safe now. Sorry for the inconvenience. DataCollider

  54. Intersection Metaphor: Data Race

  55. Implementation

  56. Sampling memory accesses with code breakpoints; part 1 Process Advantages  Zero base-overhead – no 1. Analyze target binary for memory access instructions. code breakpoints means 2. Hook the breakpoint handler. only the original code is 3. Set code breakpoints at a running. sampling of the memory access instructions.  No annotations required 4. Begin execution. – only symbols.

Recommend


More recommend