PLDI 2017 Barcelona Taming Undefined Behavior in LLVM Juneyoung Lee Yoonseung Kim Seoul National Univ. Youngju Song Chung-Kil Hur Sanjoy Das Azul Systems Google David Majnemer University of Utah John Regehr Nuno P. Lopes Microsoft Research
What this talk is about • A compiler IR (Intermediate Representation) can be designed to allow more optimizations by supporting “undefined behaviors (UBs)” • LLVM IR’s UB model - Complicated - Invalidates some textbook optimizations • Our new UB model - Simpler - Can validate textbook optimizations (and more) 2 / 21
Undefined Behavior (UB) & Problems 3 / 21
Motivation for UB Peephole Optimization int* p int a int b IR IR output(p + a > p + b) output(a > b) 4 / 21
Motivation for UB Peephole Optimization int* p int a int b 0xFFFFFF00 IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 4 / 21
Motivation for UB Peephole Optimization int* p int a int b 0xFFFFFF00 IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 (Overflow!) 4 / 21
Motivation for UB Peephole Optimization int* p int a int b 0xFFFFFF00 IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 false (Overflow!) 4 / 21
Motivation for UB Peephole Optimization int* p int a int b 0xFFFFFF00 IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 false true (Overflow!) 4 / 21
Motivation for UB Peephole Optimization Simple UB Model: int* p Pointer Arithmetic Overflow is int a Undefined Behavior int b UB 0xFFFFFF00 IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 false true (Overflow!) 4 / 21
Problems with UB Loop Invariant Code Motion Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior IR IR ... q = p + 0x100 for(i=0; i<n; ++i) for(i=0; i<n; ++i) { { a[i] = p + 0x100 a[i] = q } } 5 / 21
Problems with UB Loop Invariant Code Motion Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior 0xFFFFFF00 IR IR 0 ... q = p + 0x100 for(i=0; i<n; ++i) for(i=0; i<n; ++i) { { 0 a[i] = p + 0x100 a[i] = q } } 0xFFFFFF00 5 / 21
Problems with UB Loop Invariant Code Motion Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior 0xFFFFFF00 Overflow! IR IR 0 ... q = p + 0x100 for(i=0; i<n; ++i) for(i=0; i<n; ++i) { { 0 a[i] = p + 0x100 a[i] = q } } 0xFFFFFF00 5 / 21
Problems with UB Loop Invariant Code Motion Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior 0xFFFFFF00 UB Overflow! IR IR 0 ... q = p + 0x100 for(i=0; i<n; ++i) for(i=0; i<n; ++i) { { 0 a[i] = p + 0x100 a[i] = q } } 0xFFFFFF00 5 / 21
Existing Approaches 6 / 21
Poison Value: A Deferred UB Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior 0xFFFFFF00 Overflow! UB IR IR 0 q = p + 0x100 ... for(i=0; i<n; ++i) for(i=0; i<n; ++i) { { 0 a[i] = q a[i] = p + 0x100 } } 0xFFFFFF00 7 / 21
Poison Value: A Deferred UB LLVM’s UB Model: Simple UB Model: Pointer Arithmetic Overflow is Pointer Arithmetic Overflow is A Poison “Value” Undefined Behavior 0xFFFFFF00 poison Overflow! UB IR IR 0 q = p + 0x100 ... for(i=0; i<n; ++i) for(i=0; i<n; ++i) { { 0 a[i] = q a[i] = p + 0x100 } } 0xFFFFFF00 7 / 21
Poison Value: A Deferred UB LLVM’s UB Model: Simple UB Model: Pointer Arithmetic Overflow is Pointer Arithmetic Overflow is A Poison “Value” Undefined Behavior 0xFFFFFF00 poison Overflow! UB IR IR 0 q = p + 0x100 ... for(i=0; i<n; ++i) for(i=0; i<n; ++i) { { 0 a[i] = q a[i] = p + 0x100 } } 0xFFFFFF00 7 / 21
Poison Value: A Deferred UB LLVM’s UB Model: Pointer Arithmetic Overflow is A Poison “Value” 0xFFFFFF00 UB IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 (Overflow!) 8 / 21
Poison Value: A Deferred UB LLVM’s UB Model: Pointer Arithmetic Overflow is A Poison “Value” 0xFFFFFF00 UB IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 Poison (Overflow!) 8 / 21
Poison Value: A Deferred UB LLVM’s UB Model: Pointer Arithmetic Overflow is A Poison “Value” 0xFFFFFF00 UB IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 Poison (Overflow!) 8 / 21
Poison Value: A Deferred UB LLVM’s UB Model: Pointer Arithmetic Overflow is A Poison “Value” UB 0xFFFFFF00 UB IR IR output(p + a > p + b) output(a > b) 0x100 0 0x100 0 0x0 Poison (Overflow!) 8 / 21
Summary of Poison 0xFFFFFF00 0x100 p a p b + + > output 9 / 21
Summary of Poison 0xFFFFFF00 0x100 p a p b poison + + > output 9 / 21
Summary of Poison 0xFFFFFF00 0x100 p a p b poison + + > poison Propagate output 9 / 21
Summary of Poison 0xFFFFFF00 0x100 p a p b poison + + > poison Propagate Raise UB output UB 9 / 21
Summary of Poison 0xFFFFFF00 0x100 p a p b poison + + “Poison is > poison Propagate Sometimes Too Poisonous” Raise UB output UB 9 / 21
Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: Branching on poison is ??? if (x == y) { if (x == y) { .. use x .. .. use y .. } } 10 / 21
Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: Branching on poison is ??? 0 poison 0 poison if (x == y) { if (x == y) { .. use x .. .. use y .. } } 10 / 21
Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: Branching on poison is ??? poison poison 0 poison 0 poison if (x == y) { if (x == y) { .. use x .. .. use y .. } } 10 / 21
Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: Branching on poison is ??? poison poison 0 poison 0 poison if (x == y) { if (x == y) { .. use x .. .. use y .. } } 10 / 21
Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: Branching on poison is ??? poison poison 0 poison 0 poison if (x == y) { if (x == y) { .. use x .. .. use y .. } } poison 0 10 / 21
Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: Branching on poison is ??? poison poison 0 poison 0 poison UB if (x == y) { if (x == y) { .. use x .. .. use y .. } } poison 0 10 / 21
Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: LLVM’s UB Model: Branching on poison is Branching on poison is Undefined Behavior ??? poison poison 0 poison 0 poison UB if (x == y) { if (x == y) { .. use x .. .. use y .. } } poison 0 10 / 21
Problems with LLVM’s UB Global Value Numbering (GVN) LLVM’s UB Model: LLVM’s UB Model: Branching on poison is Branching on poison is Undefined Behavior ??? poison poison UB UB 0 poison 0 poison if (x == y) { if (x == y) { .. use x .. .. use y .. } } poison 0 10 / 21
Problems with LLVM’s UB Loop Unswitching (LU) LLVM’s UB Model: Branching on poison is Undefined Behavior while (n > 0) { if (cond) if (cond) while (n > 0) A { A } else else B while (n > 0) } { B } 11 / 21
Problems with LLVM’s UB Loop Unswitching (LU) LLVM’s UB Model: Branching on poison is Undefined Behavior 0 poison while (n > 0) { if (cond) if (cond) while (n > 0) A { A } 0 poison else else B while (n > 0) } { B } 11 / 21
Problems with LLVM’s UB Loop Unswitching (LU) LLVM’s UB Model: Branching on poison is Undefined Behavior UB 0 poison while (n > 0) { if (cond) if (cond) while (n > 0) A { A } 0 poison else else B while (n > 0) } { B } 11 / 21
Inconsistency in LLVM • GVN + LU is inconsistent. • We found a miscompilation bug in LLVM due to the inconsistency (LLVM Bugzilla 31652). - It is being discussed in the community - No solution has been found yet 12 / 21
Our Approach 13 / 21
Overview Existing Approaches Complex Inconsistent GVN + LU UB Can’t Control More Defined Poison Poison values Undef. values Defined values 14 / 21
Overview Existing Approaches Our Approach Simpler Complex Inconsistent GVN + LU UB UB Can’t Control More Defined Poison Poison values Poison values 𝒈𝒔𝒇𝒇𝒜𝒇 Undef. values Defined values Defined values 14 / 21
Overview Existing Approaches Our Approach Simpler Complex Inconsistent GVN + LU UB UB Can’t Control More Defined Poison Poison values Poison values Can Control 𝒈𝒔𝒇𝒇𝒜𝒇 Undef. values Poison Defined values Defined values 14 / 21
Overview Existing Approaches Our Approach Simpler Complex Consistent Inconsistent GVN + LU UB UB Can’t Control More Defined Poison Poison values Poison values Can Control 𝒈𝒔𝒇𝒇𝒜𝒇 Undef. values Poison Defined values Defined values 14 / 21
Key Idea: “Freeze” • Introduce a new instruction y = freeze x • Semantics: When x is a defined value: freeze x x 0 1 When x is a poison value: freeze x 2 . . . Nondet. Choice of A Defined Value 15 / 21
Recommend
More recommend