Scalability-First Pointer Analysis with Self-Tuning Context Sensitivity Yue Li, Tian Tan, Anders Møller and Yannis Smaragdakis 1
Pointer Analysis • Concept Which objects a variable may point to? • Importance Fundamental for virtually all static analyses e.g., call graphs, alias, etc. Useful for many software engineering tasks e.g., bug detection, security analysis, program understanding, etc. 2
Problem: Unpredictable Scalability 3
Problem: Unpredictable Scalability • Precise pointer analysis is hard to scale • Context Sensitivity (CS): precise but slow • Context Insensitivity (CI): imprecise but fast 4
Problem: Unpredictable Scalability • Precise pointer analysis is hard to scale • Context Sensitivity (CS): precise but slow • Context Insensitivity (CI): imprecise but fast • Variants of Context Sensitivity • Object Sensitivity (obj) • Type Sensitivity (type) Less precise Faster 2obj 2type 1type CI 5
Problem: Unpredictable Scalability 2obj 2type CI timeout (>10800 seconds) 10000 8000 5374 6000 4000 2950 2458 1203 2000 994 960 285 289 228 135 117 112 95 93 49 53 54 67 48 45 22 22 0 6
Problem: Unpredictable Scalability • Scenario 7
Problem: Unpredictable Scalability • Scenario as a part of a large-scale security analysis 8
Problem: Unpredictable Scalability • Scenario as a part of a large-scale security analysis 9
Problem: Unpredictable Scalability • Scenario X ? Precise 2obj X Unscalable for many X X X X X as a part of a large-scale security analysis 10
Problem: Unpredictable Scalability • Scenario ? Precise 2obj Unscalable for many Imprecise for all ? Scalable CI as a part of a large-scale security analysis 11
Problem: Unpredictable Scalability • Scenario ? Precise 2obj Unscalable for many Imprecise for all ? Scalable CI as a part of a large-scale security analysis • Iterate until most precise that scales: 2obj à 2type à 1type à CI • Sleepless nights and still not great precision! 12
Good Scalability & High Precision regardless of the program being analyzed Scaler 13
Good Scalability & High Precision regardless of the program being analyzed Scaler 2obj 2type Scaler timeout (>10800 seconds) Scalability 10000 as good as CI 8000 Precision 5374 6000 comparable to or better than the best scalable CS 4000 2950 2458 1769 1236 1194 1203 2000 960 705 652 452 285 254 289 272 95 93 93 53 45 53 54 0 14
Idea Scaler 15
Key Concept Number of worst-case CS points-to facts for method m c #ctx m #pts m * c : number of contexts for method m under CS c #ctx m • : number of points-to facts for method m #pts m • 16
Insight • Too many CS points-to facts generated for certain methods 17
Insight • Too many CS points-to facts generated for certain methods • m is scalability-critical method under CS c c > ST (Scalability Threshold) #ctx m #pts m * ( c is expensive) 18
Insight • Too many CS points-to facts generated for certain methods • m is scalability-critical method under CS c c > ST (Scalability Threshold) #ctx m #pts m * ( c is expensive) • Identify scalability-critical method m c ’ #pts m ≤ ST (Scalability Threshold) #ctx m * (choose cheap c ’ ) 19
Insight • Too many CS points-to facts generated for certain methods • m is scalability-critical method under CS c c > ST (Scalability Threshold) #ctx m #pts m * ( c is expensive) • Identify scalability-critical method m c ’ #pts m ≤ ST (Scalability Threshold) #ctx m * (choose cheap c ’ ) How to identify scalability-critical methods? c How to estimate ? #ctx m #pts m * 20
c How to estimate #pts m ? #ctx m * Pre-analysis: points-to results of CI • #pts obtained directly m c #ctx m • obtained by leveraging Object Allocation Graph* (based on CI) Context estimation problem à Graph traversal problem *Making k-object-sensitive pointer analysis more precise with still k-limiting . Tan et al. SAS 2016 21
Example Scaler 22
m · #pts c #ctx m 2obj c = 1 000 000 100 000 10 000 0 method method 10 000 1 23
m · #pts c #ctx m 2obj c = 1 000 000 ST: Scalability Threshold ST p 100 000 10 000 0 method method method 10 000 1 1000 2obj 24
m · #pts c #ctx m 2obj c = 1 000 000 ST: Scalability Threshold ST p 100 000 10 000 0 method method method 10 000 1 1000 2obj 25
m · #pts c #ctx m 2obj c = 1 000 000 ST: Scalability Threshold ST p 100 000 10 000 0 method method method 10 000 1 1000 ? 2obj 26
m · #pts c #ctx m 2obj c = 1 000 000 2type c = 1type c = ST p 100 000 10 000 0 method method method 10 000 1 1000 ? 2obj 27
m · #pts c #ctx m 2obj c = 1 000 000 2type c = 1type c = ST p 100 000 10 000 0 method method method 10 000 1 1000 2obj 2type 28
m · #pts c #ctx m 2obj c = 1 000 000 2type c = 1type c = 100 000 ST p 10 000 0 method 1 method method method 2000 4000 10 000 1type 2type 2obj 29
m · #pts c #ctx m 2obj c = 1 000 000 2type c = 1type c = For any scalability-critical method, use the 100 000 most precise CS variant that can turn it to a non-scalability-critical method ST p 10 000 0 method 1 method method method 2000 4000 10 000 1type 2type 2obj 30
m · #pts c #ctx m 2obj c = 1 000 000 2type c = 1type c = For any scalability-critical method, use the 100 000 most precise CS variant that can turn it to a non-scalability-critical method ST p 10 000 0 method 1 method method method 2000 4000 10 000 1type 2type 2obj 31
Total Scalability Threshold (TST) To automatically choose an appropriate for different program p ST p 32
Total Scalability Threshold (TST) To automatically choose an appropriate for different program p ST p • TST is memory size related • TST indicates analysis capacity How many points-to facts can the memory hold? 33
Total Scalability Threshold (TST) To automatically choose an appropriate for different program p ST p • TST is memory size related • TST indicates analysis capacity How many points-to facts can the memory hold? Program B Program A Σ Σ c c #ctx m #ctx m #pts m #pts m * * Memory TST 34
m · #pts c #ctx m 2obj c = 1 000 000 2type c = 1type c = 100 000 ST p 10 000 0 method 1 method method method 2000 4000 10 000 35
m · #pts c #ctx m 2obj c = 1 000 000 2type c = 1type c = 100 000 Program P ST ) = ( A1 + A2 + A3 ≤ TST E ( ST ) E Σ p p c #ctx m #pts m * ST p ST is automatically computed p based on the above inequality 10 000 A1 A2 A3 0 method method 1 method method 10 000 2000 4000 36
m · #pts c #ctx m 2obj c = 1 000 000 2type c = Program P 1type c = Σ c #ctx m #pts m * 100 000 ST ) = ( A1 + A2 + A3 ≤ TST E p ST p ST is automatically computed p based on the above inequality 10 000 A1 A2 A3 0 method method 1 method method 10 000 2000 4000 37
m · #pts c #ctx m 2obj c = 1 000 000 ST is the max value 2type c = p satisfying this inequality 1type c = 100 000 ST ) = ( A1 + A2 + A3 ≤ TST E p ST p ST is automatically computed p based on the above inequality 10 000 A1 A2 A3 0 method method 1 method method 10 000 2000 4000 38
Scalability-First Pointer Analysis with Scaler Self-Tuning Context Sensitivity CS Variants m · #pts c #ctx m c = 2obj 1 000 000 2type c = 1type self-tuned by c = 100 000 ST ) = E ( A1 + A2 + A3 ≤ TST ST p p ST p ST is automatically computed p based on the above inequality depends on 10 000 A1 A2 A3 0 TST method 1 method method method 2000 4000 10 000 39
Results Scaler 40
10 Popular Java Programs Chart Luindex 41
Settings • TST = 30M (48G Memory) - 20M, 40M, 60M, etc. are all ok - Larger TST means better precision but worse efficiency • Time budget = 3 hours (per program) Results Scaler 2obj 2type Scaler timeout (>10800 seconds) Scalability 10000 as good as CI 8000 5374 Precision 6000 4000 comparable to or better 2950 2458 1769 than the best scalable CS 1236 1194 1203 2000 960 705 652 452 285 289 254 272 95 93 93 53 45 53 54 0 42
Settings • TST = 30M (48G Memory) - 20M, 40M, 60M, etc. are all ok - Larger TST means better precision but worse efficiency • Time budget = 3 hours (per program) Results Scaler Scalability Complex program as good as CI Medium-Complexity Precision program comparable to or better Luindex Simple program than the best scalable CS 43
Complex program Precision Metrics Analysis Time (seconds) #may-fail #poly #reachable #call graph 3h = 10800s casts calls methods edges CI 112 2234 2778 12718 114856 2obj à 2type à 1type >3h + >3h + 1997 2117 2577 12430 111834 Scaler 452 1852 2500 12167 107410 In all cases, lower is better 44
Medium-Complexity program Precision Metrics Analysis Time (seconds) #may-fail #poly #reachable #call graph 3h = 10800s casts calls methods edges CI 49 2508 2925 13036 77370 2obj à 2type à 1type 2458 1409 2182 12657 65836 Scaler 272 1452 2195 12676 66177 In all cases, lower is better 45
Recommend
More recommend