Efficient and Precise Points-to Analysis: Modeling the Heap by Merging Equivalent Automata Tian Tan, Yue Li and Jingling Xue PLDI 2017 June, 2017 1
A New Points-to Analysis T echnique for Object-Oriented Programs 2
Points-to Analysis  Determines ◦ “which objects a variable can point to?” 3
Uses of Points-to Analysis Clients Tools  Security analysis  Bug detection  Compiler optimization Chord  Program verification  Program understanding …  … 4
Uses of Points-to Analysis Clients Tools  Security analysis  Bug detection  Compiler optimization Chord  Program verification  Program understanding …  … Call Graph 5
Existing Call Graph Construction  On-the-fly construction (run with points-to analysis) ◦ Precise ◦ Inefficient 6
Existing Call Graph Construction  On-the-fly construction (run with points-to analysis) ◦ Precise ◦ Inefficient  3-object-sensitive points-to analysis ◦ Very precise ◦ Adopted by, e.g., Chord 7 7
3-Object-Sensitive Points-to Analysis  Analyze Java programs ◦ Intel Xeon E5 3.70GHz,128GB of memory ◦ Time budget: 5 hours (18000 secs) 8
3-Object-Sensitive Points-to Analysis  Analyze Java programs ◦ Intel Xeon E5 3.70GHz,128GB of memory ◦ Time budget: 5 hours (18000 secs) Analysis time (sec.) 14469 pmd (4 hours) Unscalable findbugs (> 5 hours) 0 5000 10000 15000 9
T wo Mainstreams of Points-to Analysis T echniques  Model control-flow  Model data-flow 10
T wo Mainstreams of Points-to Analysis T echniques  Model control-flow ◦ Context-sensitivity  Call-site- sensitivity (PLDI’04, PLDI’06)  Object- sensitivity (ISSTA’02, TOSEM’05, SAS’16)  Type- sensitivity (POPL’11)  …  Model data-flow 11
T wo Mainstreams of Points-to Analysis T echniques  Model control-flow ◦ Context-sensitivity  Call-site- sensitivity (PLDI’04, PLDI’06)  Object- sensitivity (ISSTA’02, TOSEM’05, SAS’16)  Type- sensitivity (POPL’11)  …  Model data-flow ◦ Heap abstraction  Allocation-site abstraction  Type-based abstraction  … 12
T wo Mainstreams of Points-to Analysis T echniques  Model control-flow ◦ Context-sensitivity  Call-site- sensitivity (PLDI’04, PLDI’06)  Object- sensitivity (ISSTA’02, TOSEM’05, SAS’16)  Type- sensitivity (POPL’11)  …  Model data-flow ◦ Heap abstraction  Allocation-site abstraction  Type-based abstraction  … 13
Heap Abstraction Dynamic Static execution analysis abstracted or partitioned … … Finite Infinite-size (abstract) heap objects 14
Allocation-Site Abstraction  One object per allocation site 1 A a1 = new A(); 2 A a2 = new A(); 3 B b = new B() ; 15
Allocation-Site Abstraction  One object per allocation site o 1 A 1 A a1 = new A(); A o 2 2 A a2 = new A(); 3 B b = new B() ; o 3 B 16
Allocation-Site Abstraction  One object per allocation site ◦ Adopted by all mainstream points-to analyses o 1 A 1 A a1 = new A(); A o 2 2 A a2 = new A(); 3 B b = new B() ; o 3 B 17
Allocation-Site Abstraction  Over-partition for call graph construction o 1 o 2 A A o 1 A 1 A a1 = new A(); void foo(Object o) { o.toString(); 2 A a2 = new A(); A o 2 3 foo(a1); } 4 foo(a2); A::toString() 18
Allocation-Site Abstraction  Over-partition for type-dependent clients ◦ Call graph construction ◦ Devirtualization ◦ May-fail casting o 1 o 2 A A ◦ … o 1 A 1 A a1 = new A(); void foo(Object o) { o.toString(); 2 A a2 = new A(); A o 2 3 foo(a1); A a = (A) o; 4 foo(a2); } 19
Type-Based Abstraction  One object per type 1 A a1 = new A(); 2 A a2 = new A(); 3 B b = new B() ; 20
Type-Based Abstraction  One object per type A o 1 A a1 = new A(); 2 A a2 = new A(); B o 3 B b = new B(); 21
Type-Based Abstraction  Precision loss for type-dependent clients A o A a1 = new A(); A a2 = new A(); B o B b = new B(); C c = new C(); C o a1.f = b; a2.f = c; Object o = a1.f; o.toString(); 22
Type-Based Abstraction  Precision loss for type-dependent clients A o A a1 = new A(); A a2 = new A(); B o B b = new B(); C c = new C(); C o B o A o a1.f = b; a2.f = c; C o Object o = a1.f; o.toString(); 23
Type-Based Abstraction  Precision loss for type-dependent clients A o A a1 = new A(); A a2 = new A(); B o B b = new B(); C c = new C(); C o B o A o a1.f = b; a2.f = c; C o B o Object o = a1.f; o.toString(); C o 24
Type-Based Abstraction  Precision loss for type-dependent clients A o A a1 = new A(); A a2 = new A(); B o B b = new B(); C c = new C(); C o B o A o a1.f = b; a2.f = c; C o B o Object o = a1.f; B::toString() o.toString(); C o C::toString() 25
Type-Based Abstraction  Precision loss for type-dependent clients A o A a1 = new A(); A a2 = new A(); B o B b = new B(); C c = new C(); C o B o A o a1.f = b; a2.f = c; C o B o Object o = a1.f; B::toString() o.toString(); C o C::toString() False positive 26
Our Goal: Improve Efficiency Preserve Precision 27
M AHJONG : A New Heap Abstraction Analysis Time (sec.) 128 14469 pmd (4 fours) 524 Unscalable findbugs (> 5 hours) MAHJONG Allocation-site abstraction Improve Efficiency Adopted by all mainstream points-to analyses 28
M AHJONG : A New Heap Abstraction Analysis Time (sec.) 128 14469 pmd (4 fours) 524 Unscalable findbugs (> 5 hours) MAHJONG Allocation-site abstraction Improve Efficiency Adopted by all mainstream points-to analyses #call graph edges 44016 pmd 44004 MAHJONG Allocation-site abstraction Preserve Precision 29
M AHJONG : A New Heap Abstraction Analysis Time (sec.) 128 14469 pmd (4 fours) 524 Unscalable findbugs (> 5 hours) MAHJONG Allocation-site abstraction Improve Efficiency Adopted by all mainstream points-to analyses #call graph edges 44016 pmd 44004 MAHJONG Allocation-site abstraction Preserve Precision How? 30
alleviate Merging Objects Over-Partition cause Blindly Merging Objects Precision Loss 31
alleviate Merging Objects Over-Partition cause Blindly Merging Objects Precision Loss f o 3 o 1 B A f o 4 C o 2 A inconsistent inconsistent types types 32
alleviate Merging Objects Over-Partition cause Blindly Merging Objects Precision Loss f B o 3 o 1 B A o f A o f f C o 4 o C o 2 A inconsistent types 33
Type-Consistent Objects  Definition T and O j T are type-consistent objects, O i if for every sequence of field names, = f 1 . f 2 . ... . f n : f O i T . and O j T . point to the objects of the f f same types. 34
Type-Consistent Objects  Definition T and O j T are type-consistent objects, O i if for every sequence of field names, = f 1 . f 2 . ... . f n : f O i T . and O j T . point to the objects of the f f same types. M AHJONG only merges type-consistent objects 35
Type-Consistent Objects  Example o 7 Y h h o 3 T f U o 9 Y o 1 g k o 11 o 5 Y X o 4 U h f o 2 T o 8 Y g o 6 X k 36
Type-Consistent Objects  Example O 1 O 2 T T o 7 Y h .f U U h o 3 T f U o 9 Y o 1 .f.h Y Y g k o 11 o 5 Y X .g X X .g.k Y Y o 4 U h f o 2 T o 8 Y g o 6 X k 37
Type-Consistent Objects  Example ∵ O 1 O 2 T T o 7 Y h .f U U h o 3 T f U o 9 Y o 1 .f.h Y Y g k o 11 o 5 Y X .g X X .g.k Y Y o 4 U h f T and O 2 T are o 2 T o 8 Y O 1 ∴ type-consistent objects g o 6 X k 38
How to Check Type-Consistency? 39
Our Solution: Sequential Automata Check Test T ype-Consistency Equivalence of Objects of Automata 40
Sequential Automata  6-tuple (Q, Σ , δ , q 0 , Γ , γ ), where: ◦ Q is a set of states ◦ Σ is a set of input symbols ◦ δ is the next-state map: Q × Σ  P (Q) ◦ q 0 is the initial state ◦ Γ is a set of output symbols ◦ γ is the output map: Q  Γ 41
Check Test T ype-Consistency Equivalence of Objects of Automata How? 42
Objects Automata  A set of objects  Q: a set of states  A set of field names  Σ : a set of input symbols  δ : the next-state map  The field points-to map  The object to be checked  q 0 : the initial state  A set of types  Γ : a set of output symbols  The object-to-type map  γ : the output map o 4 U h f o 2 T o 8 Y o 6 X g k 43
Objects Automata  A set of objects  Q: a set of states  A set of field names  Σ : a set of input symbols  δ : the next-state map  The field points-to map  The object to be checked  q 0 : the initial state  A set of types  Γ : a set of output symbols  The object-to-type map  γ : the output map objects ↔ states o 4 U h f O 2 T , O 4 U , O 6 X , O 8 Y o 2 T o 8 Y o 6 X g k 44
Objects Automata  A set of objects  Q: a set of states  A set of field names  Σ : a set of input symbols  δ : the next-state map  The field points-to map  The object to be checked  q 0 : the initial state  A set of types  Γ : a set of output symbols  The object-to-type map  γ : the output map field names ↔ input symbols o 4 U h f f, g, h, k o 2 T o 8 Y o 6 X g k 45
Recommend
More recommend