learning a static analyzer from data
play

Learning a Static Analyzer from Data by Pavol Bielik, Veselin - PowerPoint PPT Presentation

Learning a Static Analyzer from Data by Pavol Bielik, Veselin Raychev, and Martin Vechev Daniel Perez The University of Tokyo January 29, 2018 Static analyzers Writing a static analyzer is hard JavaScript points-to sample global.length = 4;


  1. Learning a Static Analyzer from Data by Pavol Bielik, Veselin Raychev, and Martin Vechev Daniel Perez The University of Tokyo January 29, 2018

  2. Static analyzers Writing a static analyzer is hard JavaScript points-to sample global.length = 4; var dat = [5, 3, 9, 1]; function isBig(value) { return value >= this .length; } dat.filter(isBig); dat.filter(isBig, 42); dat.filter(isBig, dat); 2 Many corner cases → many handcrafted rules FlowJS type checking core is ~12 , 000 lines of ML

  3. Sample learned analyzer object s i t e value else // 2nd arg is a primitive else object global undefined then i s I d e n t i f i e r then i s We would like to automatically learn such rules while avoiding overfitting the else if 2nd argument global dat.filter(isBig, 42); training data // points to global dat.filter(isBig); // points to boxed 42 // points to dat object dat.filter(isBig, dat); Array . prototype . f i l t e r : : = if c a l l e r has one argument then 3 points − to if 2nd argument points − to points − to 2nd argument points − to new allocation

  4. Overview System takes dataset and rules as input and outputs analysis 4

  5. Model input var b = which enables to model | Language template is Rules description language Analysis result a = b ; Dataset { } ; // object s 0 Sample input x Example sample • y is the analysis result • x is an input program where 5 {( x i , y i )} N System takes a dataset D = i = 1 ⟨ Action ⟩ ::= action on AST ⟨ Guard ⟩ ::= condition ⟨ Prog ⟩ ::= ⟨ Action ⟩ ‘ if ’ ⟨ Guard ⟩ ‘ then ’ ⟨ Prog ⟩ ‘ else ’ ⟨ Prog ⟩ y = { ( a → { s 0 } ) }

  6. Analyzer properties Precise the goal is to minimize otherwise 0 1 We want the analyzer pa to be sound and precise Given 6 i.e. sound on the dataset Analzyer pa is sound if but too hard to proof, instead Sound  ∀ p ∈ T L , α ([ [ p ] ]) ⊑ pa ( p ) y ̸ = pa ( x )  r ( x , y , pa ) =  ∀ i ∈ 1 . . . N , y i ⊑ pa ( x i ) ∑ cost ( D , pa ) = r ( x , y , pa ) ( x , y ) ∈D

  7. Learning algorithm Information gain • Greedy, locally optimal Algorithm properties w a best w a best w a best Learning procedure based on ID3 d w a best IG is information gain: difference of entropy 7 return a best procedure Synthetize( D ) {( x i , y i )} N Input : Dataset D = i = 1 Output : Program pa ∈ L = ⟨ r ( x i , y i , a best ) | i ∈ 1 . . . | d |⟩ a best ← arg min a ∈ Actions cost ( D , a ) if cost ( D , a ) = 0 then − |D g | ( ) ( ) IG a best ( D , g ) = H D D g |D| H − |D ¬ g | ⊤ IG a best ( D , g ) g best ← arg max g ∈ Guards ( ) D ¬ g | D | H if g best = ⊥ then return Approximate( D ) p 1 ← Synthetize ( { ( x , y ) ∈ D| g best ( x ) } ) p 2 ← Synthetize ( { ( x , y ) ∈ D|¬ g best ( x ) } ) • Sound on D iif. Approximate is sound return ( if g best then p 1 else p 2 )

  8. Oracle — counter-example generator Goal a = b; var c = 1; var b = {}; Counter-example else then y if there is VarDecl:x(y) then y if y is VarDecl:y preceding x Overfitted analysis a = b; var b = {}; Sample input Example • Non-semantic preserving ( Global jump ) • Semantic preserving (Equivalence Modulo Abstraction, EMA) Modification types • Random search too slow 8 Find counter-example ( x , y ) st. pa ( x ) ̸ = y in reasonable time • Prioritize modifications affecting execution path of pa ( x ) ⊥

  9. Evaluation Adding dead code Side-Effect Free expressions Changing constants Renaming user functions Adding method parameters Renaming variables Adding method arguments F gj Overview F ema Program modifications • Input programs from ECMAScript conformance suite (~15000 samples) • Site-call allocation analysis • Points-to analysis subset ( this points-to) • Learned 2 analyzers 9

  10. Points-to analysis has one argument then s i t e value else // 2nd arg is a primitive else object global undefined then i s I d e n t i f i e r then i s else if 2nd argument object Goal global if c a l l e r // points to boxed 42 Learn this points-to rules, a function f st. Example // points to global dat.filter(isBig); 10 dat.filter(isBig, 42); // points to dat object dat.filter(isBig, dat); Array . prototype . f i l t e r : : = VarPointsTo ( v 2 , h ) v 2 = f ( this ) VarPointsTo ( this , h ) points − to if 2nd argument points − to points − to 2nd argument points − to new allocation

  11. Points-to analysis rules description language Generate actions with programs up to size 5 and branches programs up to size 6 (5 moves and 1 write) 11 ⟨ MoveCore ⟩ ::= Up | Left | Right | DownFirst | DownLast | Top ⟨ MoveJS ⟩ ::= GoToGlobal | GoToUndef | GoToNull | GoToThis | UpUntilFunc ⟨ Move ⟩ ::= ⟨ MoveCore ⟩ | ⟨ MoveJS ⟩ | GoToCaller ⟨ Write ⟩ ::= WriteValue | WritePos | WriteType | HasLeft | HasRight | HasChild ⟨ Action ⟩ ::= ϵ | ⟨ Move ⟩ ⟨ Action ⟩ ⟨ Guard ⟩ ::= ϵ | ⟨ Move ⟩ ⟨ Guard ⟩ | ⟨ Write ⟩ ⟨ Guard ⟩ ⟨ Context ⟩ ::= ϵ | ( N ∪ Σ ∪ N ) ⟨ Context ⟩ ⟨ Prog ⟩ ::= ϵ | ⟨ Action ⟩ | ‘ if ’ ⟨ Guard ⟩ ‘ = ’ ⟨ Context ⟩ ‘ then ’ ⟨ Prog ⟩ ‘ else ’ ⟨ Prog ⟩

  12. Points-to analysis results map 73 53 find 177 604 forEach 82 229 some 64 Function Name 315 Array.prototype 372 Dataset Size Counter-examples Found Function.prototype call 26 apply 6 182 12 Analysis Size ∗ 97 ( 18 ) 54 ( 10 ) 36 ( 6 ) 36 ( 6 ) 35 ( 5 ) 36 ( 6 ) ∗ Number of instructions in L pt (Number of if branches)

  13. Allocation analysis Goal Learn a allocation site analysis function f st. Results • 34721 input/output samples • 135 branches generated • 905 counter examples found 13 f ( l ) = true AllocSite ( l ) • learned tricky cases — e.g. new Object(obj)

  14. Summary • New approach to learn static analyzer from data • Algorithm to learn analyzer from dataset and inference rules • Oracle to quickly generate counter-examples, avoiding overfitting • Learned tricky rules for JavaScript points-to and site-allocation analysis 14

  15. References P. Bielik, V. Raychev, and M. T. Vechev, “Learning a static analyzer from data,” CoRR , vol. abs/1611.01752, 2016. [Online]. Available: http://arxiv.org/abs/1611.01752 J. R. Quinlan, “Induction of decision trees,” Mach. Learn. , vol. 1, no. 1, pp. 81–106, Mar. 1986. [Online]. Available: http://dx.doi.org/10.1023/A:1022643204877 15

Recommend


More recommend