learning a static analyzer from data
play

Learning a Static Analyzer from Data Pavol Bielik Veselin Raychev - PowerPoint PPT Presentation

Learning a Static Analyzer from Data Pavol Bielik Veselin Raychev Martin Vechev Department of Computer Science CAV 2017 ETH Zurich July 22-28, Heidelberg Writing a Static Analyzer Framework for Java Static Type Checker Static Type


  1. Learning a Static Analyzer from Data Pavol Bielik Veselin Raychev Martin Vechev Department of Computer Science CAV 2017 ETH Zurich July 22-28, Heidelberg

  2. Writing a Static Analyzer Framework for Java Static Type Checker Static Type Checker Pointer Analysis for JavaScript for JavaScript 17 contributors ~400 contributors Writing static analyzer is Writing static analyzer is hard Writing static analyzer is frustrating Writing static analyzer is time consuming Writing static analyzer is brittle

  3. Example of Unsound Analysis Missed Error Error correctly reported

  4. This Work: Learn a Static Analyzer Can we learn a static analyzer? (aka its abstract transformers)

  5. This Work: Learn Static Analyzer from Data Input Dataset � = { ⟨� � , � � ⟩ } � =1

  6. This Work: Learn Static Analyzer from Data Input Dataset � = { ⟨� � , � � ⟩ } � =1 Language � for abstract transformers

  7. This Work: Learn Static Analyzer from Data Input Dataset � = { ⟨� � , � � ⟩ } � =1 Synthesis + �� best ∊ � Over- Language � approximation for abstract transformers

  8. This Work: Learn Static Analyzer from Data Input Dataset Input Dataset � = { ⟨� � , � � ⟩ } � =1 � = { ⟨� � , � � ⟩ } � =1 Synthesis + �� best ∊ � Over- Language � approximation for abstract transformers How to obtain suitable dataset?

  9. This Work: Learn Static Analyzer from Data Input Dataset � = { ⟨� � , � � ⟩ } � =1 Synthesis + �� best ∊ � Over- Language � Language � approximation for abstract for abstract transformers transformers What is the language over which to learn? How to allow generating new interesting transformers?

  10. This Work: Learn Static Analyzer from Data Input Dataset � = { ⟨� � , � � ⟩ } � =1 Synthesis Synthesis + + �� best ∊ � Over- Over- Language � approximation approximation for abstract transformers How to design scalable learning over large search spaces? How to prevent overfitting?

  11. This Work: Learn a Static Analyzer Can we learn a static analyzer?

  12. This Work: Learn a Static Analyzer interpretable and sound Can we learn a static analyzer? Problem Formulation analysis �� best = arg min ���� ( � , �� ) precision �� ∈ � analysis st. ∀ ⟨� , �⟩ ∈ � . � ( � ) ⊑ �� ( � ) soundness

  13. An Example Transformer Learned Array.prototype.filter ::= if caller has one argument then points-to global object else if 2nd argument is Identifier then if 2nd argument is undefined then points-to global object else points-to 2nd argument else if 2nd argument is this then points-to 2nd argument else if 2nd argument is null then points-to global object else //2nd argument is a primitive value points-to new allocation site

  14. An Example Transformer Learned Array.prototype.filter ::= if caller has one argument then points-to global object else if 2nd argument is Identifier then if 2nd argument is undefined then points-to global object else points-to 2nd argument else if 2nd argument is this then points-to 2nd argument else if 2nd argument is null then points-to global object else //2nd argument is a primitive value points-to new allocation site

  15. An Example Transformer Learned Array.prototype.filter ::= if caller has one argument then points-to global object else if 2nd argument is Identifier then if 2nd argument is undefined then points-to global object else points-to 2nd argument else if 2nd argument is this then points-to 2nd argument else if 2nd argument is null then points-to global object else //2nd argument is a primitive value points-to new allocation site

  16. Let us show the learning on an example analysis (aka points-to analysis)

  17. Dataset: Points-to Analysis execution Program Abstract Syntax Tree (AST) reads/writes � ₁ function collect(value, idx, obj) { IfStatement - � ₂ if (value >= this .threshold) { BinaryExpression - � ₃ � ₁ ... Identifier:value � ₄ } MemberExpression - � ₅ � ₂ ... ThisExpression � ₆ � ₃ } Property:threshold

  18. Dataset: Points-to Analysis execution Program Abstract Syntax Tree (AST) reads/writes � ₁ function collect(value, idx, obj) { IfStatement - � ₂ if (value >= this .threshold) { BinaryExpression - � ₃ � ₁ ... Identifier:value � ₄ } MemberExpression - � ₅ � ₂ ... ThisExpression � ₆ � ₃ } Property:threshold ⟨ ( ��� , � ₅ ), � ₂ ⟩ � = { ⟨� � , � � ⟩ } � =1

  19. Language Describing Abstract Transformers � ∊ � ≔ � | if � then � else � � ∊ ������� � ∊ ������ function collect(val, idx, obj) { if (val >= this .threshold) { ... } Points-to Query } � ₁ var dat = [5, 3, 9]; dat.filter( collect, ctx ); method name has is filter 2nd argument � ₁ � ₂

  20. Language Describing Abstract Transformers � ∊ � ≔ � | if � then � else � � ∊ ������� � ∊ ������ true � ₁ function collect(val, idx, obj) { true � ₂ if (val >= this .threshold) { ... } f a Points-to Query � ₁ l � ₂ } s e � ₁ f a � ₃ l s e var dat = [5, 3, 9]; dat.filter( collect, ctx ); can be represented as decision tree method name has is filter 2nd argument paths interpreted as abstract transformers � ₁ � ₂

  21. Learning: Decision Trees Input Dataset � = { ⟨� � , � � ⟩ } � =1 Synthesis Synthesis + + �� best ∊ � Over- Over- Language � approximation approximation for abstract transformers

  22. Learning: Decision Trees + CEGIS Input Dataset candidate � = { ⟨� � , � � ⟩ } � =1 analysis �� ∊ � Synthesis Oracle: + Over- Test/Verify Language � Counter-example approximation Analyzer for abstract ⟨� , �⟩ ∉ � transformers � ← � ∪ { ⟨� , �⟩ } no counter-example return analysis ��

  23. Learning: Problem Formulation Problem Formulation Cost Function �� best = arg min ���� ( � , �� ) � ( � , � , �� ) = if ( � ≠ �� ( � )) then 1 else 0 �� ∈ � ���� ( � , �� ) = ∑ � ( � , � , �� ) st. ∀ ⟨� , �⟩ ∈ � . � ( � ) ⊑ �� ( � ) ⟨� , �⟩ ∈ � guarantees analysis soundness prefer analysis with fewer errors

  24. Learning Algorithm � ∊ � ≔ � | if � then � else � 10^18 true � ₁ 10^6 � ₂ 10^6 f a l � ₂ s 10^6 e Untractable

  25. Learning Algorithm � ∊ � ≔ � | if � then � else � Key Idea: Synthesise Programs in Parts 10^6 � ₁ 10^6

  26. Learning Algorithm � ∊ � ≔ � | if � then � else � Key Idea: Synthesise Programs in Parts 10^6 + 10^6 true � ₁ 10^6 � ₂ 10^6

  27. Learning Algorithm � ∊ � ≔ � | if � then � else � Key Idea: Synthesise Programs in Parts 10^6 + 10^6 + 10^6 true � ₁ 10^6 � ₂ 10^6 f a l � ₂ s 10^6 e

  28. Learning Algorithm � best = arg min ���� ( � , � ) � ∊ ������� ���� ( � , � best ) > 0 ���� ( � , � best ) = 0 � best refine analysis no errors � best return � best � �

  29. Learning Algorithm � best = arg max InfGain ( � , � , � best ) � ∊ ������ ���� ( � , � best ) > 0 � ₁ � best Find split � * refine analysis that separates � ₂ � best � �

  30. Learning Algorithm � best = arg max InfGain ( � , � , � best ) � ∊ ������ ���� ( � , � best ) > 0 � ₁ � best � � Find split � * refine analysis that separates � ₂ � best � � � �

  31. Learning Algorithm � best = arg max InfGain ( � , � , � best ) � ∊ ������ ���� ( � , � best ) > 0 � ₁ � best InfGain ( � , � , � best ) = 0 � � Find split � * refine analysis no split reduces entropy that separates � ₂ � best approximate ( � ) � � � �

  32. Learning: Decision Trees + CEGIS Input Dataset candidate � = { ⟨� � , � � ⟩ } � =1 analysis �� ∊ � Synthesis Oracle: Oracle: + Over- Test/Verify Test/Verify Language � Counter-example approximation Analyzer Analyzer for abstract ⟨� , �⟩ ∉ � transformers � ← � ∪ { ⟨� , �⟩ } no counter-example How to find complex counter-examples quickly? return analysis �� How to efficiently explore hard to find corner cases?

  33. Naive Approach: Random Fuzzing 1. Pick a random training example ⟨� , �⟩ ∊ � � ’ � 2. Mutate the input randomly 3. Obtain the correct label Execute � ’ � ’ 4. Check for correctness ∀ ⟨� , �⟩ ∈ � ’ . � ( � ) ⊑ �� ( � ) 5. Repeat

  34. Naive Approach: Random Fuzzing 1. Pick a random training example Exponential Number of Choices 2. Mutate the input randomly Slow 3. Obtain the correct label 4. Check for correctness When to stop? 5. Repeat

  35. The Oracle: Testing an Analyzer Key Idea: Take advantage of candidate analysis �� How to sample from space of all programs? �

  36. The Oracle: Testing an Analyzer execution path coverage of �� �

Recommend


More recommend