Learning a Static Analyzer from Data Pavol Bielik Veselin Raychev - PowerPoint PPT Presentation

Learning a Static Analyzer from Data Pavol Bielik Veselin Raychev Martin Vechev Department of Computer Science CAV 2017 ETH Zurich July 22-28, Heidelberg

Writing a Static Analyzer Framework for Java Static Type Checker Static Type Checker Pointer Analysis for JavaScript for JavaScript 17 contributors ~400 contributors Writing static analyzer is Writing static analyzer is hard Writing static analyzer is frustrating Writing static analyzer is time consuming Writing static analyzer is brittle

Example of Unsound Analysis Missed Error Error correctly reported

This Work: Learn a Static Analyzer Can we learn a static analyzer? (aka its abstract transformers)

This Work: Learn Static Analyzer from Data Input Dataset � = { ⟨� � , � � ⟩ } � =1

This Work: Learn Static Analyzer from Data Input Dataset � = { ⟨� � , � � ⟩ } � =1 Language � for abstract transformers

This Work: Learn Static Analyzer from Data Input Dataset � = { ⟨� � , � � ⟩ } � =1 Synthesis + �� best ∊ � Over- Language � approximation for abstract transformers

This Work: Learn Static Analyzer from Data Input Dataset Input Dataset � = { ⟨� � , � � ⟩ } � =1 � = { ⟨� � , � � ⟩ } � =1 Synthesis + �� best ∊ � Over- Language � approximation for abstract transformers How to obtain suitable dataset?

This Work: Learn Static Analyzer from Data Input Dataset � = { ⟨� � , � � ⟩ } � =1 Synthesis + �� best ∊ � Over- Language � Language � approximation for abstract for abstract transformers transformers What is the language over which to learn? How to allow generating new interesting transformers?

This Work: Learn Static Analyzer from Data Input Dataset � = { ⟨� � , � � ⟩ } � =1 Synthesis Synthesis + + �� best ∊ � Over- Over- Language � approximation approximation for abstract transformers How to design scalable learning over large search spaces? How to prevent overfitting?

This Work: Learn a Static Analyzer Can we learn a static analyzer?

This Work: Learn a Static Analyzer interpretable and sound Can we learn a static analyzer? Problem Formulation analysis �� best = arg min �� ( � , �� ) precision �� ∈ � analysis st. ∀ ⟨� , �⟩ ∈ � . � ( � ) ⊑ �� ( � ) soundness

An Example Transformer Learned Array.prototype.filter ::= if caller has one argument then points-to global object else if 2nd argument is Identifier then if 2nd argument is undefined then points-to global object else points-to 2nd argument else if 2nd argument is this then points-to 2nd argument else if 2nd argument is null then points-to global object else //2nd argument is a primitive value points-to new allocation site

Let us show the learning on an example analysis (aka points-to analysis)

Dataset: Points-to Analysis execution Program Abstract Syntax Tree (AST) reads/writes � ₁ function collect(value, idx, obj) { IfStatement - � ₂ if (value >= this .threshold) { BinaryExpression - � ₃ � ₁ ... Identifier:value � ₄ } MemberExpression - � ₅ � ₂ ... ThisExpression � ₆ � ₃ } Property:threshold

Dataset: Points-to Analysis execution Program Abstract Syntax Tree (AST) reads/writes � ₁ function collect(value, idx, obj) { IfStatement - � ₂ if (value >= this .threshold) { BinaryExpression - � ₃ � ₁ ... Identifier:value � ₄ } MemberExpression - � ₅ � ₂ ... ThisExpression � ₆ � ₃ } Property:threshold ⟨ ( �� , � ₅ ), � ₂ ⟩ � = { ⟨� � , � � ⟩ } � =1

Language Describing Abstract Transformers � ∊ � ≔ � | if � then � else � � ∊ �� ∊ �� function collect(val, idx, obj) { if (val >= this .threshold) { ... } Points-to Query } � ₁ var dat = [5, 3, 9]; dat.filter( collect, ctx ); method name has is filter 2nd argument � ₁ � ₂

Language Describing Abstract Transformers � ∊ � ≔ � | if � then � else � � ∊ �� ∊ �� true � ₁ function collect(val, idx, obj) { true � ₂ if (val >= this .threshold) { ... } f a Points-to Query � ₁ l � ₂ } s e � ₁ f a � ₃ l s e var dat = [5, 3, 9]; dat.filter( collect, ctx ); can be represented as decision tree method name has is filter 2nd argument paths interpreted as abstract transformers � ₁ � ₂

Learning: Decision Trees Input Dataset � = { ⟨� � , � � ⟩ } � =1 Synthesis Synthesis + + �� best ∊ � Over- Over- Language � approximation approximation for abstract transformers

Learning: Decision Trees + CEGIS Input Dataset candidate � = { ⟨� � , � � ⟩ } � =1 analysis �� ∊ � Synthesis Oracle: + Over- Test/Verify Language � Counter-example approximation Analyzer for abstract ⟨� , �⟩ ∉ � transformers � ← � ∪ { ⟨� , �⟩ } no counter-example return analysis ��

Learning: Problem Formulation Problem Formulation Cost Function �� best = arg min �� ( � , �� ) � ( � , � , �� ) = if ( � ≠ �� ( � )) then 1 else 0 �� ∈ � �� ( � , �� ) = ∑ � ( � , � , �� ) st. ∀ ⟨� , �⟩ ∈ � . � ( � ) ⊑ �� ( � ) ⟨� , �⟩ ∈ � guarantees analysis soundness prefer analysis with fewer errors

Learning Algorithm � ∊ � ≔ � | if � then � else � 10^18 true � ₁ 10^6 � ₂ 10^6 f a l � ₂ s 10^6 e Untractable

Learning Algorithm � ∊ � ≔ � | if � then � else � Key Idea: Synthesise Programs in Parts 10^6 � ₁ 10^6

Learning Algorithm � ∊ � ≔ � | if � then � else � Key Idea: Synthesise Programs in Parts 10^6 + 10^6 true � ₁ 10^6 � ₂ 10^6

Learning Algorithm � ∊ � ≔ � | if � then � else � Key Idea: Synthesise Programs in Parts 10^6 + 10^6 + 10^6 true � ₁ 10^6 � ₂ 10^6 f a l � ₂ s 10^6 e

Learning Algorithm � best = arg min �� ( � , � ) � ∊ �� ( � , � best ) > 0 �� ( � , � best ) = 0 � best refine analysis no errors � best return � best � �

Learning Algorithm � best = arg max InfGain ( � , � , � best ) � ∊ �� ( � , � best ) > 0 � ₁ � best Find split � * refine analysis that separates � ₂ � best � �

Learning Algorithm � best = arg max InfGain ( � , � , � best ) � ∊ �� ( � , � best ) > 0 � ₁ � best � � Find split � * refine analysis that separates � ₂ � best � � � �

Learning Algorithm � best = arg max InfGain ( � , � , � best ) � ∊ �� ( � , � best ) > 0 � ₁ � best InfGain ( � , � , � best ) = 0 � � Find split � * refine analysis no split reduces entropy that separates � ₂ � best approximate ( � ) � � � �

Learning: Decision Trees + CEGIS Input Dataset candidate � = { ⟨� � , � � ⟩ } � =1 analysis �� ∊ � Synthesis Oracle: Oracle: + Over- Test/Verify Test/Verify Language � Counter-example approximation Analyzer Analyzer for abstract ⟨� , �⟩ ∉ � transformers � ← � ∪ { ⟨� , �⟩ } no counter-example How to find complex counter-examples quickly? return analysis �� How to efficiently explore hard to find corner cases?

Naive Approach: Random Fuzzing 1. Pick a random training example ⟨� , �⟩ ∊ � � ’ � 2. Mutate the input randomly 3. Obtain the correct label Execute � ’ � ’ 4. Check for correctness ∀ ⟨� , �⟩ ∈ � ’ . � ( � ) ⊑ �� ( � ) 5. Repeat

Naive Approach: Random Fuzzing 1. Pick a random training example Exponential Number of Choices 2. Mutate the input randomly Slow 3. Obtain the correct label 4. Check for correctness When to stop? 5. Repeat

The Oracle: Testing an Analyzer Key Idea: Take advantage of candidate analysis �� How to sample from space of all programs? �

The Oracle: Testing an Analyzer execution path coverage of ��

Learning a Static Analyzer from Data Pavol Bielik Veselin Raychev - PowerPoint PPT Presentation

Learning a Static Analyzer from Data Pavol Bielik Veselin Raychev Martin Vechev Department of Computer Science CAV 2017 ETH Zurich July 22-28, Heidelberg Writing a Static Analyzer Framework for Java Static Type Checker Static Type

Developing the Clang Static Analyzer Artem Dergachev, Apple Clang Static Analyzer Finds bugs

Infrared Gas Analyzer - component analyzer - component analyzer Type: ZRJ Standard type Type:

Learning a Static Analyzer from Data by Pavol Bielik, Veselin Raychev, and Martin Vechev Daniel

BC-5300 Auto Hematology Analyzer Satisfaction in test BC-5300 Auto Hematology Analyzer The new

BC-5380 Auto Hematology Analyzer Satisfaction in test BC-5380 Auto Hematology Analyzer The new

Formal verification of a static analyzer: abstract interpretation in type theory Xavier Leroy

Faster, Stronger C++ Analysis with the Clang Static Analyzer George Karpenkov, Apple Artem

Summary-based inter-unit analysis for Clang Static Analyzer Aleksei Sidorin 2016-11-01 . S

Static and Method Overloading static One per class, not per object static variables

MPI-Checker Static Analysis for MPI Alexander Droste, Michael Kuhn, Thomas Ludwig November

Using the Clang Static Analyzer Vince Bridgers About this tutorial Soup to nuts

Static and dynamic verification Static and dynamic V&V Software inspections Concerned

FC80 Free Chlorine Analyzer E LECTRO- C HEMICAL D EVICES FC80 System Configuration Free

PSL Analyzer Products PQube 3e PQube 3v PQube 3 PQube 3 - the best power analyzer PQube 3 is

Hand-held laser analyzer chemical composition metals and alloys Elemental Laser AN ANalyzer

valgrind code analyzer Valgrind is another injection-based profiler/analyzer Can be used to

Everybody be Cool, This is a Robbery! Normal: Normal: Emphasis: Emphasis: Jean-Baptiste Bedrune

An OCL Map Type Edward Willink Willink Transformations Ltd Eclipse Foundation MMT Component

Web applications are used extensively in many areas: We will rely on web applications more in

CO444H Pointer analysis Ben Livshits 1 Approaches to Finding Reliability and Security Bugs

Structure-aware fuzzing for real-world projects Rka Kovcs Etvs Lornd University,

Opensource Column Store Databases: MariaDB ColumnStore vs. ClickHouse Alexander Rubin

Monadic Imperatjve languages C# Java C / C++ Fortran Subtract abstractjons Scala Add

What is fuzzing? A kind of random testing Goal : make sure certain bad things dont

Learning a Static Analyzer from Data Pavol Bielik Veselin Raychev - PowerPoint PPT Presentation

Learning a Static Analyzer from Data Pavol Bielik Veselin Raychev Martin Vechev Department of Computer Science CAV 2017 ETH Zurich July 22-28, Heidelberg Writing a Static Analyzer Framework for Java Static Type Checker Static Type

Developing the Clang Static Analyzer Artem Dergachev, Apple Clang Static Analyzer Finds bugs

Infrared Gas Analyzer - component analyzer - component analyzer Type: ZRJ Standard type Type:

Learning a Static Analyzer from Data by Pavol Bielik, Veselin Raychev, and Martin Vechev Daniel

BC-5300 Auto Hematology Analyzer Satisfaction in test BC-5300 Auto Hematology Analyzer The new

BC-5380 Auto Hematology Analyzer Satisfaction in test BC-5380 Auto Hematology Analyzer The new

Formal verification of a static analyzer: abstract interpretation in type theory Xavier Leroy

Faster, Stronger C++ Analysis with the Clang Static Analyzer George Karpenkov, Apple Artem

Summary-based inter-unit analysis for Clang Static Analyzer Aleksei Sidorin 2016-11-01 . S

Static and Method Overloading static One per class, not per object static variables

MPI-Checker Static Analysis for MPI Alexander Droste, Michael Kuhn, Thomas Ludwig November

Using the Clang Static Analyzer Vince Bridgers About this tutorial Soup to nuts

Static and dynamic verification Static and dynamic V&amp;V Software inspections Concerned

FC80 Free Chlorine Analyzer E LECTRO- C HEMICAL D EVICES FC80 System Configuration Free

PSL Analyzer Products PQube 3e PQube 3v PQube 3 PQube 3 - the best power analyzer PQube 3 is

Hand-held laser analyzer chemical composition metals and alloys Elemental Laser AN ANalyzer

valgrind code analyzer Valgrind is another injection-based profiler/analyzer Can be used to

Everybody be Cool, This is a Robbery! Normal: Normal: Emphasis: Emphasis: Jean-Baptiste Bedrune

An OCL Map Type Edward Willink Willink Transformations Ltd Eclipse Foundation MMT Component

Web applications are used extensively in many areas: We will rely on web applications more in

CO444H Pointer analysis Ben Livshits 1 Approaches to Finding Reliability and Security Bugs

Structure-aware fuzzing for real-world projects Rka Kovcs Etvs Lornd University,

Opensource Column Store Databases: MariaDB ColumnStore vs. ClickHouse Alexander Rubin

Monadic Imperatjve languages C# Java C / C++ Fortran Subtract abstractjons Scala Add

What is fuzzing? A kind of random testing Goal : make sure certain bad things dont

Static and dynamic verification Static and dynamic V&V Software inspections Concerned