Efficient and Precise Points-to Analysis: Modeling the Heap by Merging Equivalent Automata Tian Tan, Yue Li and Jingling Xue PLDI 2017 June, 2017 1
A New Points-to Analysis T echnique for Object-Oriented Programs 2
Points-to Analysis Determines ◦ “which objects a variable can point to?” 3
Uses of Points-to Analysis Clients Tools Security analysis Bug detection Compiler optimization Chord Program verification Program understanding … … 4
Uses of Points-to Analysis Clients Tools Security analysis Bug detection Compiler optimization Chord Program verification Program understanding … … Call Graph 5
Existing Call Graph Construction On-the-fly construction (run with points-to analysis) ◦ Precise ◦ Inefficient 6
Existing Call Graph Construction On-the-fly construction (run with points-to analysis) ◦ Precise ◦ Inefficient 3-object-sensitive points-to analysis ◦ Very precise ◦ Adopted by, e.g., Chord 7 7
3-Object-Sensitive Points-to Analysis Analyze Java programs ◦ Intel Xeon E5 3.70GHz,128GB of memory ◦ Time budget: 5 hours (18000 secs) 8
3-Object-Sensitive Points-to Analysis Analyze Java programs ◦ Intel Xeon E5 3.70GHz,128GB of memory ◦ Time budget: 5 hours (18000 secs) Analysis time (sec.) 14469 pmd (4 hours) Unscalable findbugs (> 5 hours) 0 5000 10000 15000 9
T wo Mainstreams of Points-to Analysis T echniques Model control-flow Model data-flow 10
T wo Mainstreams of Points-to Analysis T echniques Model control-flow ◦ Context-sensitivity Call-site- sensitivity (PLDI’04, PLDI’06) Object- sensitivity (ISSTA’02, TOSEM’05, SAS’16) Type- sensitivity (POPL’11) … Model data-flow 11
T wo Mainstreams of Points-to Analysis T echniques Model control-flow ◦ Context-sensitivity Call-site- sensitivity (PLDI’04, PLDI’06) Object- sensitivity (ISSTA’02, TOSEM’05, SAS’16) Type- sensitivity (POPL’11) … Model data-flow ◦ Heap abstraction Allocation-site abstraction Type-based abstraction … 12
T wo Mainstreams of Points-to Analysis T echniques Model control-flow ◦ Context-sensitivity Call-site- sensitivity (PLDI’04, PLDI’06) Object- sensitivity (ISSTA’02, TOSEM’05, SAS’16) Type- sensitivity (POPL’11) … Model data-flow ◦ Heap abstraction Allocation-site abstraction Type-based abstraction … 13
Heap Abstraction Dynamic Static execution analysis abstracted or partitioned … … Finite Infinite-size (abstract) heap objects 14
Allocation-Site Abstraction One object per allocation site 1 A a1 = new A(); 2 A a2 = new A(); 3 B b = new B() ; 15
Allocation-Site Abstraction One object per allocation site o 1 A 1 A a1 = new A(); A o 2 2 A a2 = new A(); 3 B b = new B() ; o 3 B 16
Allocation-Site Abstraction One object per allocation site ◦ Adopted by all mainstream points-to analyses o 1 A 1 A a1 = new A(); A o 2 2 A a2 = new A(); 3 B b = new B() ; o 3 B 17
Allocation-Site Abstraction Over-partition for call graph construction o 1 o 2 A A o 1 A 1 A a1 = new A(); void foo(Object o) { o.toString(); 2 A a2 = new A(); A o 2 3 foo(a1); } 4 foo(a2); A::toString() 18
Allocation-Site Abstraction Over-partition for type-dependent clients ◦ Call graph construction ◦ Devirtualization ◦ May-fail casting o 1 o 2 A A ◦ … o 1 A 1 A a1 = new A(); void foo(Object o) { o.toString(); 2 A a2 = new A(); A o 2 3 foo(a1); A a = (A) o; 4 foo(a2); } 19
Type-Based Abstraction One object per type 1 A a1 = new A(); 2 A a2 = new A(); 3 B b = new B() ; 20
Type-Based Abstraction One object per type A o 1 A a1 = new A(); 2 A a2 = new A(); B o 3 B b = new B(); 21
Type-Based Abstraction Precision loss for type-dependent clients A o A a1 = new A(); A a2 = new A(); B o B b = new B(); C c = new C(); C o a1.f = b; a2.f = c; Object o = a1.f; o.toString(); 22
Type-Based Abstraction Precision loss for type-dependent clients A o A a1 = new A(); A a2 = new A(); B o B b = new B(); C c = new C(); C o B o A o a1.f = b; a2.f = c; C o Object o = a1.f; o.toString(); 23
Type-Based Abstraction Precision loss for type-dependent clients A o A a1 = new A(); A a2 = new A(); B o B b = new B(); C c = new C(); C o B o A o a1.f = b; a2.f = c; C o B o Object o = a1.f; o.toString(); C o 24
Type-Based Abstraction Precision loss for type-dependent clients A o A a1 = new A(); A a2 = new A(); B o B b = new B(); C c = new C(); C o B o A o a1.f = b; a2.f = c; C o B o Object o = a1.f; B::toString() o.toString(); C o C::toString() 25
Type-Based Abstraction Precision loss for type-dependent clients A o A a1 = new A(); A a2 = new A(); B o B b = new B(); C c = new C(); C o B o A o a1.f = b; a2.f = c; C o B o Object o = a1.f; B::toString() o.toString(); C o C::toString() False positive 26
Our Goal: Improve Efficiency Preserve Precision 27
M AHJONG : A New Heap Abstraction Analysis Time (sec.) 128 14469 pmd (4 fours) 524 Unscalable findbugs (> 5 hours) MAHJONG Allocation-site abstraction Improve Efficiency Adopted by all mainstream points-to analyses 28
M AHJONG : A New Heap Abstraction Analysis Time (sec.) 128 14469 pmd (4 fours) 524 Unscalable findbugs (> 5 hours) MAHJONG Allocation-site abstraction Improve Efficiency Adopted by all mainstream points-to analyses #call graph edges 44016 pmd 44004 MAHJONG Allocation-site abstraction Preserve Precision 29
M AHJONG : A New Heap Abstraction Analysis Time (sec.) 128 14469 pmd (4 fours) 524 Unscalable findbugs (> 5 hours) MAHJONG Allocation-site abstraction Improve Efficiency Adopted by all mainstream points-to analyses #call graph edges 44016 pmd 44004 MAHJONG Allocation-site abstraction Preserve Precision How? 30
alleviate Merging Objects Over-Partition cause Blindly Merging Objects Precision Loss 31
alleviate Merging Objects Over-Partition cause Blindly Merging Objects Precision Loss f o 3 o 1 B A f o 4 C o 2 A inconsistent inconsistent types types 32
alleviate Merging Objects Over-Partition cause Blindly Merging Objects Precision Loss f B o 3 o 1 B A o f A o f f C o 4 o C o 2 A inconsistent types 33
Type-Consistent Objects Definition T and O j T are type-consistent objects, O i if for every sequence of field names, = f 1 . f 2 . ... . f n : f O i T . and O j T . point to the objects of the f f same types. 34
Type-Consistent Objects Definition T and O j T are type-consistent objects, O i if for every sequence of field names, = f 1 . f 2 . ... . f n : f O i T . and O j T . point to the objects of the f f same types. M AHJONG only merges type-consistent objects 35
Type-Consistent Objects Example o 7 Y h h o 3 T f U o 9 Y o 1 g k o 11 o 5 Y X o 4 U h f o 2 T o 8 Y g o 6 X k 36
Type-Consistent Objects Example O 1 O 2 T T o 7 Y h .f U U h o 3 T f U o 9 Y o 1 .f.h Y Y g k o 11 o 5 Y X .g X X .g.k Y Y o 4 U h f o 2 T o 8 Y g o 6 X k 37
Type-Consistent Objects Example ∵ O 1 O 2 T T o 7 Y h .f U U h o 3 T f U o 9 Y o 1 .f.h Y Y g k o 11 o 5 Y X .g X X .g.k Y Y o 4 U h f T and O 2 T are o 2 T o 8 Y O 1 ∴ type-consistent objects g o 6 X k 38
How to Check Type-Consistency? 39
Our Solution: Sequential Automata Check Test T ype-Consistency Equivalence of Objects of Automata 40
Sequential Automata 6-tuple (Q, Σ , δ , q 0 , Γ , γ ), where: ◦ Q is a set of states ◦ Σ is a set of input symbols ◦ δ is the next-state map: Q × Σ P (Q) ◦ q 0 is the initial state ◦ Γ is a set of output symbols ◦ γ is the output map: Q Γ 41
Check Test T ype-Consistency Equivalence of Objects of Automata How? 42
Objects Automata A set of objects Q: a set of states A set of field names Σ : a set of input symbols δ : the next-state map The field points-to map The object to be checked q 0 : the initial state A set of types Γ : a set of output symbols The object-to-type map γ : the output map o 4 U h f o 2 T o 8 Y o 6 X g k 43
Objects Automata A set of objects Q: a set of states A set of field names Σ : a set of input symbols δ : the next-state map The field points-to map The object to be checked q 0 : the initial state A set of types Γ : a set of output symbols The object-to-type map γ : the output map objects ↔ states o 4 U h f O 2 T , O 4 U , O 6 X , O 8 Y o 2 T o 8 Y o 6 X g k 44
Objects Automata A set of objects Q: a set of states A set of field names Σ : a set of input symbols δ : the next-state map The field points-to map The object to be checked q 0 : the initial state A set of types Γ : a set of output symbols The object-to-type map γ : the output map field names ↔ input symbols o 4 U h f f, g, h, k o 2 T o 8 Y o 6 X g k 45
Recommend
More recommend