To Towards Production-Ru Run Heisenbugs Re Reproduction on Commercial Hardware Shiyou Huang Bowen Cai and Jeff Huang 1
What’s a coder’s worst nightmare? https://www.quora.com/What-is-a-coders-worst-nightmare 2
The bug only occurs in production but cannot be replicated locally. https://www.quora.com/What-is-a-coders-worst-nightmare 3
Heis Heisenbug enbug When you trace them, they disappear! 4
Heis Heisenbug enbug When you trace them, they disappear! Localization is hard • 5
Heis Heisenbug enbug When you trace them, they disappear! Localization is hard • reproduction is hard • 6
Heis Heisenbug enbug When you trace them, they disappear! Localization is hard • reproduction is hard • never know if it is fixed… • 7
A A motivating ng exampl ple Init: x=1, y=2 T2 T1 7: if (z==1) 1: T2.start() ✗ z=1 8: assert(x+1==y) 2: z=0 3: x++ contradiction! x=2, y=3 4: y++ 5: z=1 x+1==y 6: T2.join() http://stackoverflow.com/questions/16159203/ 8
A A motivating ng exampl ple Init: x=1, y=2 T2 T1 7: if (z==1) 1: T2.start() ✗ z=1 8: assert(x+1==y) 2: z=0 3: x++ contradiction! x=2, y=3 4: y++ PSO 5: z=1 x+1==y 6: T2.join() http://stackoverflow.com/questions/16159203/ 9
A A motivating ng exampl ple Init: x=1, y=2 T2 T1 $12 million loss of equipment! 7: if (z==1) 1: T2.start() ✗ z==1 8: assert(x+1==y) 2: z=0 3: x++ contradiction! x=2, y=3 4: y++ 5: z=1 x+1==y 6: T2.join() http://stackoverflow.com/questions/16159203/ 10
Re Record & Re Replay (Rn RnR) Goal: record the non-determinism at runtime and reproduce the failure Record Replay Failure Execution 11
Re Record & Re Replay (Rn RnR) Goal: record the non-determinism at runtime and reproduce the failure Record Replay • runtime overhead • the ability to reproduce failures Failure Execution 12
Re Related Work • Software-based approach • order-based : fully record shared memory dependencies at runtime • LEAP[FSE’10], Order[USENIX ATC’11], Chimera[PLDI’12], Light[PLDI’15] RR[USENIX ATC’17]… • Chimera: > 2.4x • search-based : partially record the dependencies at runtime and use offline analysis (e.g. SMT solvers) to reason the dependencies • ODR[SOSP’09], Lee et al. [MICRO’09], Weeratunge et al.[ASPLOS’10], CLAP[PLDI’13]… • CLAP: 0.9x – 3x • Hardware-based approach • Rerun[ISCA’08], Delorean[ISCA’08], Coreracer[MICRO’11], PBI[ASPLOS’13]… • rely on special hardware that are not deployed 13
Re Reality of Rn RnR • high overheads • failing to reproduce failures • lack of commodity hardware support In production 14
Co Contri ributions Goal: record the execution at runtime with low overhead and faithfully reproduce it offline Ø RnR based on control flow tracing on commercial hardware (Intel PT) Ø core-based constraints reduction to reduce the offline computation Ø H3, evaluated on popular benchmarks and real-world applications, overhead: 1.4%-23.4% 15
In Intel el Proces essor or Trace (PT) T) PT : Program control flow tracing, supported on 5 th and 6 th generation Intel core • Low overhead, as low as 5% 1 • Highly compacted packets, <1 bit per retired instruction • One bit (1/0) for branch taken indication • Compressed branch target address 1: https://sites.google.com/site/ intelptmicrotutorial. 16
PT T Tracing Ove verhead Packets stream (per logical CPU) Intel CPU core 0...n Native PT Program Reconstructed OH(%) time (s) time (s) trace execution 4.9% overhead on Intel PT Software bodytrack 0.557 0.573 2.9% 94M Configure & Enable executions of PARSEC 3.0 Decoder Intel PT x264 1.086 1.145 5.4% 88M on average 14.7% vips 1.431 1.642 98M Runtime data Binary blackscholes 1.51 1.56 9.9% 289M Driver Image files ferret 1.699 1.769 4.1% 145M swaptions 2.81 2.98 6.0% 897M raytrace 3.818 4.036 5.7% 102M facesim 5.048 5.145 1.9% 110M fluidanimate 14.8 15.1 1.4% 1240M freqmine 15.9 17.1 7.5% 2468M Avg. 4.866 5.105 4.9% 553M 17
Ch Challenges s • PT trace: low-level representation (assembly instruction) • Absence of the thread information • No data values of memory accesses 18
So Solutions • PT trace: low-level representation & no data values • Idea: extract the path profiles from PT trace and re-execute the program by KLEE to generate symbol values • Absence of the thread information • Idea: use thread context switch information by Perf 19
H3 H3 Ov Overview Binary image T0 Tn ... core 0 core 1 PT tracing 1. Constraints formula - Path profiles Packet log Decode 2. SMT solver A global generation schedule - Path constraints - Symbolic - Core-based read-write constraints core 2 core 3 execution Symbolic trace - Synchronization constraints user end Execution recorded of each thread - Memory order constraints by each core Recording & Decoding Offline Constraints Construction & Solving Phase 1: Control-flow tracing Reconstruct the execution on each core by decoding the packets generated by PT and thread information from Perf Phase 2: Offline analysis • Path profiles of each thread • Symbolic trace of each thread • SMT constraints over the trace 20
Ex Exampl ple Step1: Collecting path profiles of each thread Init: x=1, y=2 A PT: tracing control-flow of the program’s execution Binary Packets T1 + image log 1: T2.start() T1 Decoding B C 2: z=0 Matching Binary image line numbers line 1 D 3: x++ line 2 ... A 4: y++ Binary Packets libipt line n + image log E F 5: z=1 T2 Decoding reconstructed execution program's cotrol flow B C 6: T2.join() Trace Matching perf context switch events line numbers line 1 Packets (TID, CPUID, TIME…) D T2 line 2 ... 7: if (z==1) line n E F ✗ 8: assert(x+1==y) reconstructed execution program's cotrol flow 21
Ex Exampl ple Step1: Collecting path profiles of each thread Init: x=1, y=2 A Binary Packets PT: tracing control-flow of the program’s execution + image log T1 1: T2.start() T1 Decoding B C T1 : bb1 2: z=0 Matching line numbers line 1 D BB1 3: x++ line 2 A Binary Packets + ... log image 4: y++ T2 : bb1, bb2 line n Match to *.ll E F 5: z=1 Decoding T2 B C BB1 reconstructed execution program's cotrol flow 6: T2.join() Matching line numbers line 1 D line 2 T2 ... BB2 BB3 7: if (z==1) line n E F path profile ✗ 8: assert(x+1==y) reconstructed execution program's cotrol flow 22
Ex Exampl ple Step2: symbolic trace generation Init: x=1, y=2 KLEE[OSDI’08]: execute the thread along the path profile T1 T1 1: T2.start() # = 0 𝑋 2: z=0 " ( = 𝑆 ' ( + 1 Using symbol values to represent ( , 𝑋 𝑆 ' 3: x++ ' concrete values, e.g., - = 𝑆 , - + 1 - , 𝑋 4: y++ 𝑆 , # : value written to z at line 2 , 𝑋 " . = 1 5: z=1 ( : value read from z at line 3 𝑋 𝑆 ' " 6: T2.join() T2 T2 4 == 1 7: if (z==1) 𝑈𝑠𝑣𝑓 ≡ 𝑆 " ✗ 5 + 1 ≠ 𝑆 , 8: assert(x+1==y) 5 𝑆 ' 23
Ex Exampl ple Step 3: computing global failure schedule Init: x=1, y=2 T1 CLAP[PLDI’13]: Reason dependencies of memory accesses 1: T2.start() 2: z=0 Global T1 3: x++ 4: y++ T2 5: z=1 Order variable O represents the order of a statement, e.g., 6: T2.join() O 2 <O 3 T2 means 2:z=0 happen before 3: x++ 7: if (z==1) ✗ 8: assert(x+1==y) 24
Ex Exampl ple Step 3: computing global failure schedule Init: x=1, y=2 CLAP[PLDI’13]: Reason dependencies of memory accesses T1 1: T2.start() Read-Write Constraints match a read to a write 2: z=0 $ = 0 ∧ ) $ < ) + ) ∨ (" # $ = . / ∧ ) / < ) $ ∧ () + < ) / ∨ ) $ < ) + )) 3: x++ (" # # 4: y++ Memory Order Constraints SC PSO 5: z=1 2 3 < ) 1 3 < ) 5 4 2 3 ) 0 < ) + ) / < ) 6 ) 0 < ) + < ) 1 6: T2.join() 2 3 < ) 1 2 3 < ) 5 4 4 3 < ) / < ) 6 4 ) 1 3 ) 5 3 < ) 5 8 < ) 7 9 8 < ) 7 9 ) $ < ) 7 ) $ < ) 7 T2 Path Constraints Failure Constraints 7: if (z==1) $ = 1 7 + 1! = " 9 7 " # " 8 ✗ 8: assert(x+1==y) 25
Ex Exampl ple Step 3: computing global failure schedule Init: x=1, y=2 T1 CLAP[PLDI’13]: Reason dependencies of memory accesses 1: T2.start() Read-Write Constraints match a read to a write HB 2: z=0 $ = 0 ∧ ) $ < ) + ) ∨ (" # $ = . / ∧ ) / < ) $ ∧ () + < ) / ∨ ) $ < ) + )) 3: x++ (" # # 4: y++ Memory Order Constraints SC PSO 5: z=1 2 3 < ) 1 3 < ) 5 4 2 3 ) 0 < ) + ) / < ) 6 ) 0 < ) + < ) 1 6: T2.join() 2 3 < ) 1 2 3 < ) 5 4 4 3 < ) / < ) 6 4 ) 1 3 ) 5 3 < ) 5 rf 8 < ) 7 9 8 < ) 7 9 ) $ < ) 7 T2 ) $ < ) 7 Path Constraints Failure Constraints 7: if (z==1) $ = 1 7 + 1! = " 9 7 " # " 8 ✗ 8: assert(x+1==y) 26
Recommend
More recommend