taming undefined behavior
play

Taming Undefined Behavior in LLVM Juneyoung Lee Yoonseung Kim - PowerPoint PPT Presentation

PLDI 2017 Barcelona Taming Undefined Behavior in LLVM Juneyoung Lee Yoonseung Kim Seoul National Univ. Youngju Song Chung-Kil Hur Sanjoy Das Azul Systems Google David Majnemer University of Utah John Regehr Nuno P. Lopes Microsoft


  1. PLDI 2017 Barcelona Taming Undefined Behavior in LLVM Juneyoung Lee Yoonseung Kim Seoul National Univ. Youngju Song Chung-Kil Hur Sanjoy Das Azul Systems Google David Majnemer University of Utah John Regehr Nuno P. Lopes Microsoft Research

  2. What this talk is about • A compiler IR (Intermediate Representation) can be designed to allow more optimizations by supporting “undefined behaviors (UBs)” • LLVM IR’s UB model - Complicated - Invalidates some textbook optimizations • Our new UB model - Simpler - Can validate textbook optimizations (and more) 2 / 21

  3. Undefined Behavior (UB) & Problems 3 / 21

  4. Motivation for UB Peephole Optimization int* p int a int b IR IR output(p + a > p + b) output(a > b) 4 / 21

  5. Motivation for UB Peephole Optimization int* p int a int b 0xFFFFFF00 IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 4 / 21

  6. Motivation for UB Peephole Optimization int* p int a int b 0xFFFFFF00 IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 (Overflow!) 4 / 21

  7. Motivation for UB Peephole Optimization int* p int a int b 0xFFFFFF00 IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 false (Overflow!) 4 / 21

  8. Motivation for UB Peephole Optimization int* p int a int b 0xFFFFFF00 IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 false true (Overflow!) 4 / 21

  9. Motivation for UB Peephole Optimization Simple UB Model: int* p Pointer Arithmetic Overflow is int a Undefined Behavior int b UB 0xFFFFFF00 IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 false true (Overflow!) 4 / 21

  10. Problems with UB Loop Invariant Code Motion Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior IR IR ... q = p + 0x100 for(i=0; i<n; ++i) for(i=0; i<n; ++i) { { a[i] = p + 0x100 a[i] = q } } 5 / 21

  11. Problems with UB Loop Invariant Code Motion Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior 0xFFFFFF00 IR IR 0 ... q = p + 0x100 for(i=0; i<n; ++i) for(i=0; i<n; ++i) { { 0 a[i] = p + 0x100 a[i] = q } } 0xFFFFFF00 5 / 21

  12. Problems with UB Loop Invariant Code Motion Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior 0xFFFFFF00 Overflow! IR IR 0 ... q = p + 0x100 for(i=0; i<n; ++i) for(i=0; i<n; ++i) { { 0 a[i] = p + 0x100 a[i] = q } } 0xFFFFFF00 5 / 21

  13. Problems with UB Loop Invariant Code Motion Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior 0xFFFFFF00 UB Overflow! IR IR 0 ... q = p + 0x100 for(i=0; i<n; ++i) for(i=0; i<n; ++i) { { 0 a[i] = p + 0x100 a[i] = q } } 0xFFFFFF00 5 / 21

  14. Existing Approaches 6 / 21

  15. Poison Value: A Deferred UB Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior 0xFFFFFF00 Overflow! UB IR IR 0 q = p + 0x100 ... for(i=0; i<n; ++i) for(i=0; i<n; ++i) { { 0 a[i] = q a[i] = p + 0x100 } } 0xFFFFFF00 7 / 21

  16. Poison Value: A Deferred UB LLVM’s UB Model: Simple UB Model: Pointer Arithmetic Overflow is Pointer Arithmetic Overflow is A Poison “Value” Undefined Behavior 0xFFFFFF00 poison Overflow! UB IR IR 0 q = p + 0x100 ... for(i=0; i<n; ++i) for(i=0; i<n; ++i) { { 0 a[i] = q a[i] = p + 0x100 } } 0xFFFFFF00 7 / 21

  17. Poison Value: A Deferred UB LLVM’s UB Model: Simple UB Model: Pointer Arithmetic Overflow is Pointer Arithmetic Overflow is A Poison “Value” Undefined Behavior 0xFFFFFF00 poison Overflow! UB IR IR 0 q = p + 0x100 ... for(i=0; i<n; ++i) for(i=0; i<n; ++i) { { 0 a[i] = q a[i] = p + 0x100 } } 0xFFFFFF00 7 / 21

  18. Poison Value: A Deferred UB LLVM’s UB Model: Pointer Arithmetic Overflow is A Poison “Value” 0xFFFFFF00 UB IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 (Overflow!) 8 / 21

  19. Poison Value: A Deferred UB LLVM’s UB Model: Pointer Arithmetic Overflow is A Poison “Value” 0xFFFFFF00 UB IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 Poison (Overflow!) 8 / 21

  20. Poison Value: A Deferred UB LLVM’s UB Model: Pointer Arithmetic Overflow is A Poison “Value” 0xFFFFFF00 UB IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 Poison (Overflow!) 8 / 21

  21. Poison Value: A Deferred UB LLVM’s UB Model: Pointer Arithmetic Overflow is A Poison “Value” UB 0xFFFFFF00 UB IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 Poison (Overflow!) 8 / 21

  22. Summary of Poison 0xFFFFFF00 0x100 p a p b + + > output 9 / 21

  23. Summary of Poison 0xFFFFFF00 0x100 p a p b poison + + > output 9 / 21

  24. Summary of Poison 0xFFFFFF00 0x100 p a p b poison + + > poison Propagate output 9 / 21

  25. Summary of Poison 0xFFFFFF00 0x100 p a p b poison + + > poison Propagate Raise UB output UB 9 / 21

  26. Summary of Poison 0xFFFFFF00 0x100 p a p b poison + + “Poison is > poison Propagate Sometimes Too Poisonous” Raise UB output UB 9 / 21

  27. Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: Branching on poison is ??? if (x == y) { if (x == y) { .. use x .. .. use y .. } } 10 / 21

  28. Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: Branching on poison is ??? 0 poison 0 poison if (x == y) { if (x == y) { .. use x .. .. use y .. } } 10 / 21

  29. Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: Branching on poison is ??? poison poison 0 poison 0 poison if (x == y) { if (x == y) { .. use x .. .. use y .. } } 10 / 21

  30. Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: Branching on poison is ??? poison poison 0 poison 0 poison if (x == y) { if (x == y) { .. use x .. .. use y .. } } 10 / 21

  31. Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: Branching on poison is ??? poison poison 0 poison 0 poison if (x == y) { if (x == y) { .. use x .. .. use y .. } } poison 0 10 / 21

  32. Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: Branching on poison is ??? poison poison 0 poison 0 poison UB if (x == y) { if (x == y) { .. use x .. .. use y .. } } poison 0 10 / 21

  33. Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: LLVM’s UB Model: Branching on poison is Branching on poison is Undefined Behavior ??? poison poison 0 poison 0 poison UB if (x == y) { if (x == y) { .. use x .. .. use y .. } } poison 0 10 / 21

  34. Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: LLVM’s UB Model: Branching on poison is Branching on poison is Undefined Behavior ??? poison poison UB UB 0 poison 0 poison if (x == y) { if (x == y) { .. use x .. .. use y .. } } poison 0 10 / 21

  35. Problems with LLVM’s UB Loop Unswitching (LU) LLVM’s UB Model: Branching on poison is Undefined Behavior while (n > 0) { if (cond) if (cond) while (n > 0) A { A } else else B while (n > 0) } { B } 11 / 21

  36. Problems with LLVM’s UB Loop Unswitching (LU) LLVM’s UB Model: Branching on poison is Undefined Behavior 0 poison while (n > 0) { if (cond) if (cond) while (n > 0) A { A } 0 poison else else B while (n > 0) } { B } 11 / 21

  37. Problems with LLVM’s UB Loop Unswitching (LU) LLVM’s UB Model: Branching on poison is Undefined Behavior UB 0 poison while (n > 0) { if (cond) if (cond) while (n > 0) A { A } 0 poison else else B while (n > 0) } { B } 11 / 21

  38. Inconsistency in LLVM • GVN + LU is inconsistent. • We found a miscompilation bug in LLVM due to the inconsistency (LLVM Bugzilla 31652). - It is being discussed in the community - No solution has been found yet 12 / 21

  39. Our Approach 13 / 21

  40. Overview Existing Approaches Complex Inconsistent GVN + LU UB Can’t Control More Defined Poison Poison values Undef. values Defined values 14 / 21

  41. Overview Existing Approaches Our Approach Simpler Complex Inconsistent GVN + LU UB UB Can’t Control More Defined Poison Poison values Poison values 𝒈𝒔𝒇𝒇𝒜𝒇 Undef. values Defined values Defined values 14 / 21

  42. Overview Existing Approaches Our Approach Simpler Complex Inconsistent GVN + LU UB UB Can’t Control More Defined Poison Poison values Poison values Can Control 𝒈𝒔𝒇𝒇𝒜𝒇 Undef. values Poison Defined values Defined values 14 / 21

  43. Overview Existing Approaches Our Approach Simpler Complex Consistent Inconsistent GVN + LU UB UB Can’t Control More Defined Poison Poison values Poison values Can Control 𝒈𝒔𝒇𝒇𝒜𝒇 Undef. values Poison Defined values Defined values 14 / 21

  44. Key Idea: “Freeze” • Introduce a new instruction y = freeze x • Semantics: When x is a defined value: freeze x x 0 1 When x is a poison value: freeze x 2 . . . Nondet. Choice of A Defined Value 15 / 21

Recommend


More recommend