Modular Dataflow Analysis Aivar Annamaa Feb. 23 rd , 2010 Based on: - PowerPoint PPT Presentation

Modular Dataflow Analysis Aivar Annamaa Feb. 23 rd , 2010 Based on: Rountev, Sharp, Xu, 2008 „IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries“

Problem ● Interprocedural analyses are usually too slow ● can take many hours ● can take many seconds (not usable „as-you-type“) ● If it's fast enough then probably not very precise

Solutions? ● Reduce precision? ● can make analysis useless/unusable ● Go modular ● analyze each part (eg. method) independently ● analysis process could be parallelized ● cache results (method summaries) ● only changed methods need to be re-analyzed

Challenges for modularity ● Dependencies between parts ● How to represent method summaries?

Agenda ● Dataflow analysis ● An approach for solving IDE problems ● IDE ● Transformers as graphs ● Example analysis ● Summary generation ● Benchmarks and conclusions

Dataflow analysis, CFG a = ? b = ? enter s = ? a = „x“ before if a = {x} b = ? s = ? if aCondition() { b = „x“ a = {x} a = {y} after after b = {x} b = {y} } then else s = ? s = ? else { a = „y“ b = „y“ a = {y,x} b = {y,x} } after if s = ? a = {y,x} s = a + b b = {y,x} exit s = {aa, bb, ab, ba}

Lattice of abstract values ● Elements are partially ordered ● x ≤ y means y is as least as precise as x ● two values are combined with meet (or glb ) operator ∧ ● on picture = ∧ ∪ and ≤ = ⊇ ● can be used for env-s

CFG, environments, transformers ● Each CGF node has environment representing dataflow facts ● env :: D → L ● D = set of variables ● L = set of abstract values ● Each edge has transformer ● t :: env → env ● CFG + variables + lattice + transformers = abstract version of the program

Solving dataflow problem ● Forward analysis ● start from entry node and propagate values downward ● Backward analysis ● start from exit and move upwards ● Cycles in CFG complicate things ● loop until transformers don't change anything ● often requires certain tricks to ensure termination

Interprocedural dataflow analysis ● How to handle method calls? ● Inlining called methods ● Good: it's precise ● Bad: graph can grow huge ● Bad: doesn't work with recursion ● Extend CFG ● add call nodes ● add return nodes

Unrealizable paths P1() Q() P2() x = input() enter x = z call Q call Q y = x return from Q return from Q exit print(y) doSmth(y)

Conclusion of introduction ● D = variables ● L = abstract values (in form of lattice) ● env :: D → L = dataflow facts ● Env( D → L) = lattice of all such environments ● CFG as abstract program ● Dataflow facts in nodes ● Environment transformers on edges ● Interprocedural = trouble

IDE Dataflow Problems ● Interprocedural Distributive Environment ● program is represented by ICFG ● dataflow facts are environments D → L mapping variables to some abstract values ● L is semi-lattice of finite height ● transformers are distributive ● t ( env 1 ∧ env 2 ) = t ( env 1 ) ∧ t ( env 2 )

Example: Dependence analysis ● Which parameters influence a variable? ● Flow-sensitive ● D = all local variables and formal parameters ● L = powerset of formal parameters ● with partial order and meet ⊇ ∪

Dependece analysis. Transformers ● d 2 = d 1 + d 3 ; ● env[d 1 → env(d 1 ) ⋃ env(d 3 )] ● d 1 = 68 ● env[d 1 → ∅ ] ● d = f(d 1 , d 2 ) ● assign actual arguments to formal parameters ● use f 's summary function ● assign result value to d

Transformers as graphs print(68) d 1 = 68 d 2 = d 1 + d 3 ● transformer functions are given pointwise ● Λ represents „something else than a variable“ ● meet = graph union composition = graph transitive closure

Type analysis ● „0-CFA type analysis“ ● What type can a variable possibly be? ● Relevant in OO because of polymorphism ● D = vars, params (incl. this), fields ● L = powerset of all types

Type Analysis 2 ● d := new T ● env [d → env(d) {T}] ∪ ● d 1 := d 2 ● env [d 1 → env(d 1 ) env(d ∪ 2 )] ● Flow insensitive – each transform can make result only less precise ● d 1 = d 2 .m() ● env [d 1 → [ t ( x.m() ) | x env(d ∈ 2 ) ] ]

Different calls and methods ● Exit calls ● method is not statically known ● „exits“ the scope of analysis and can't be modeled in advance ● Fixed calls ● only one possible target method ● eg. static methods on final classes ● Fixed methods ● has only fixed calls in it

Method summary generation ● Summary uses graph representation ● At method calls: ● fixed calls to fixed methods – inline method summary ● other calls – insert placeholder – resolved at full program analysis ● Summary is abstracted ● irrelevant details (for summary clients) are removed

Example of Dependency Analysis

Example summary graph

Experimental evaluation ● Created summaries for Java 1.4 (25490 methods) ● 33% of the methods are fixed ● Summaries used for analyzing 20 programs

Conclusion ● Transfer functions can be efficiently represented as graphs ● Summaries of these method graphs can be reused on different call sites ● Fixed calls are common enough to deserve special optimisations (inlining) ● Analyses with precomputed library summaries are 2x faster than analyses „from scratch“

References ● Rountev, Sharp, Xu, 2008 „IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries“ ● Sagiv, Reps, Horwitz, 1996 „Precise interprocedural dataflow analysis with applications to constant propagation“ ● Cousot & Cousot, 2002 „Modular Static Program Analysis“

Modular Dataflow Analysis Aivar Annamaa Feb. 23 rd , 2010 Based on: - PowerPoint PPT Presentation

Modular Dataflow Analysis Aivar Annamaa Feb. 23 rd , 2010 Based on: Rountev, Sharp, Xu, 2008 IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries Problem Interprocedural analyses are usually too slow can take

Modular Budgets Modular Budgets Modular Budgets Modular Budgets OSPA NANO Session 10/25/06

Naiad (Timely Dataflow) & Streaming Systems CS 848: Models and Applications of Distributed

Quantifying Dataflow Analysis with Gradients in LLVM Gabriel Ryan 1 , Abhishek Shah 1 , Dongdong

Google Cloud Dataflow Cosmin Arad , Senior Software Engineer carad@google.com August 7, 2015

CO444H Dataflow Dataflow frameworks Ben Livshits Masters Projects Available 1. Crashes to

1 TEMPORARY MODULAR HOUSING Meeting Purpose Learn how Temporary Modular Housing will allow

Modular Applications, Loose Coupling, and the NetBeans Lookup API The Need for Modular

Managing Modular Software for your NuGet, C++ and Java Development Agenda Modular software

Chapter 8 Dataflow Descriptions in VHDL 1 benyamin@mehr.sharif.edu Dataflow Description

Dataflow Testing Chapter 10 Dataflow Testing Testing All-Nodes and All-Edges in a control

Dataflow Testing Chapter 10 Dataflow Testing Testing All-Nodes and All-Edges in a control

WaveScalar Dataflow machine good at exploiting ILP dataflow parallelism + traditional

Dataflow computation, tree transformations and comonads Tarmo Uustalu, Tallinn Joint work with

Biggest Challenge: Dataflow in Meetup for Android Mike Castleman Meetup New York Android

Dataflow Supercomputers Michael J. Flynn Maxeler T echnologies and Stanford University Outline

Oversampling in a Dataflow Synchronous Language (Heptagon) erard 1 L eonard G 1 PARKAS team

Symmetry in Shapes Theory and Practice Niloy Mitra Maksim Ovsjanikov Mark Pauly

for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and Learning Lab 1 The Success

Learning may work Matthieu R. Bloch 1. A dataset D { ( x 1 , y 1 ) , , ( x N , y N )

Computing central values of twisted L-functions of higher degree Nathan Ryan Computational

Complexity and Character of Human Languages Chomsky Hierarchy Informatics 2A: Lecture 21 The

Uniform Interpolation Part II: An Algebraic Framework George Metcalfe Mathematical Institute

Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra -

A Brief History of Decision Tree Implementation MAX AUSTIN Overview Famous Decision Tree

Modular Dataflow Analysis Aivar Annamaa Feb. 23 rd , 2010 Based on: - PowerPoint PPT Presentation

Modular Dataflow Analysis Aivar Annamaa Feb. 23 rd , 2010 Based on: Rountev, Sharp, Xu, 2008 IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries Problem Interprocedural analyses are usually too slow can take

Modular Budgets Modular Budgets Modular Budgets Modular Budgets OSPA NANO Session 10/25/06

Naiad (Timely Dataflow) &amp; Streaming Systems CS 848: Models and Applications of Distributed

Quantifying Dataflow Analysis with Gradients in LLVM Gabriel Ryan 1 , Abhishek Shah 1 , Dongdong

Google Cloud Dataflow Cosmin Arad , Senior Software Engineer carad@google.com August 7, 2015

CO444H Dataflow Dataflow frameworks Ben Livshits Masters Projects Available 1. Crashes to

1 TEMPORARY MODULAR HOUSING Meeting Purpose Learn how Temporary Modular Housing will allow

Modular Applications, Loose Coupling, and the NetBeans Lookup API The Need for Modular

Managing Modular Software for your NuGet, C++ and Java Development Agenda Modular software

Chapter 8 Dataflow Descriptions in VHDL 1 benyamin@mehr.sharif.edu Dataflow Description

Dataflow Testing Chapter 10 Dataflow Testing Testing All-Nodes and All-Edges in a control

Dataflow Testing Chapter 10 Dataflow Testing Testing All-Nodes and All-Edges in a control

WaveScalar Dataflow machine good at exploiting ILP dataflow parallelism + traditional

Dataflow computation, tree transformations and comonads Tarmo Uustalu, Tallinn Joint work with

Biggest Challenge: Dataflow in Meetup for Android Mike Castleman Meetup New York Android

Dataflow Supercomputers Michael J. Flynn Maxeler T echnologies and Stanford University Outline

Oversampling in a Dataflow Synchronous Language (Heptagon) erard 1 L eonard G 1 PARKAS team

Symmetry in Shapes Theory and Practice Niloy Mitra Maksim Ovsjanikov Mark Pauly

for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and Learning Lab 1 The Success

Learning may work Matthieu R. Bloch 1. A dataset D { ( x 1 , y 1 ) , , ( x N , y N )

Computing central values of twisted L-functions of higher degree Nathan Ryan Computational

Complexity and Character of Human Languages Chomsky Hierarchy Informatics 2A: Lecture 21 The

Uniform Interpolation Part II: An Algebraic Framework George Metcalfe Mathematical Institute

Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra -

A Brief History of Decision Tree Implementation MAX AUSTIN Overview Famous Decision Tree

Naiad (Timely Dataflow) & Streaming Systems CS 848: Models and Applications of Distributed