Static (Software) Analysis Dagstuhl 16172: Machine Learning for Dynamic Software Analysis Reiner Hähnle Software Engineering Group Department of Computer Science Technische Universität Darmstadt http://www.se.tu-darmstadt.de/ haehnle@cs.tu-darmstadt.de 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 0
What Is Static Analysis (SA) of Software? Establish a property of a program at compile time, without executing it 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 1
What Is Static Analysis (SA) of Software? Establish a property of a program at compile time, without executing it Some Facts ◮ Checking done by a tool, not a human ◮ Performed usually on source or assembler code ◮ Original motivation: compiler optimization ◮ Data flow analysis, e.g., used variables ◮ Control flow analysis, e.g., reachable code ◮ Current focus: software quality ◮ Security, e.g., confidentiality, vulnerability ◮ Compliance, e.g., MISRA-C, web service protocols ◮ Defects (bug finding), e.g., memory leaks, buffer overflows ◮ Code quality, e.g., metrics, “code smells” 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 1
Static Analysis in the Narrow/Wide Sense Static Analysis in the Narrow Sense ◮ Check a fixed property ◮ Low polynomial decision complexity ◮ Value-insensitive abstraction, e.g., control flow graph 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 2
Static Analysis in the Narrow/Wide Sense Static Analysis in the Narrow Sense ◮ Check a fixed property ◮ Low polynomial decision complexity ◮ Value-insensitive abstraction, e.g., control flow graph Static Analysis in the Wide Sense (which is my sense) ◮ Complex properties, expressed in specification language ◮ security policy, interface protocol, functional property ◮ NP-hard or even undecidable ◮ heuristics optimize the “common case”, no guarantees ◮ interaction by human expert user ◮ Fully precise control flow and data model ◮ Often based on formal methods 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 2
Static Analysis Techniques ◮ Graph-based program abstractions Algorithms ◮ Control flow graph ◮ Program dependence graph ◮ Constraint Solving ◮ Recently popular: SAT/SMT solvers as backend Search ◮ Automata-based representations ◮ Model checking Inference ◮ Abstract Interpretation ◮ Symbolic Execution ◮ Deductive Verification 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 3
Static Analysis Techniques ◮ Graph-based program abstractions Algorithms ◮ Control flow graph ◮ Program dependence graph ◮ Constraint Solving ◮ Recently popular: SAT/SMT solvers as backend Search ◮ Automata-based representations ◮ Model checking Inference ◮ Abstract Interpretation ◮ Symbolic Execution ◮ Deductive Verification ML has most potential in complex SA techniques: Search ⇒ Lookup 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 3
Interlude A State-of-art Tool for Complex Static Analysis By Source, Fair use, https://en.wikipedia.org/w/index.php?curid=20208543 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 4
Challenges for SA Scaling ◮ Intra- vs. inter-procedural: compositionality difficult for complex SA ◮ Coverage/rapid evolution of industrial programming languages ◮ Hard to analyze language features: dynamic typing, reflection, HO closures 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 5
Challenges for SA Scaling ◮ Intra- vs. inter-procedural: compositionality difficult for complex SA ◮ Coverage/rapid evolution of industrial programming languages ◮ Hard to analyze language features: dynamic typing, reflection, HO closures Precision ◮ Incompleteness, false positives ◮ “Soundiness”, see B. Livshits et al., CACM 58(2) 44–46, Feb. 2015 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 5
Challenges for SA Scaling ◮ Intra- vs. inter-procedural: compositionality difficult for complex SA ◮ Coverage/rapid evolution of industrial programming languages ◮ Hard to analyze language features: dynamic typing, reflection, HO closures Precision ◮ Incompleteness, false positives ◮ “Soundiness”, see B. Livshits et al., CACM 58(2) 44–46, Feb. 2015 Modern computer architecture The deployment gap ◮ Multi-level caches, stale data ◮ Parallel computing: GPUs, weak memory models ◮ Cloud: provisioning bugs, resource-aware computing 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 5
Current Trends More complex properties, often combine behavior and data ◮ Integration tasks (web interfaces, frameworks, APIs, . . . ) ◮ Security-related: information flow ◮ Evolution: regression, certification, . . . 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 6
Current Trends More complex properties, often combine behavior and data ◮ Integration tasks (web interfaces, frameworks, APIs, . . . ) ◮ Security-related: information flow ◮ Evolution: regression, certification, . . . Combine Static and Dynamic Analysis ◮ Concolic or dynamic symbolic execution ◮ Incomplete static analysis to speed up runtime monitoring 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 6
Current Trends More complex properties, often combine behavior and data ◮ Integration tasks (web interfaces, frameworks, APIs, . . . ) ◮ Security-related: information flow ◮ Evolution: regression, certification, . . . Combine Static and Dynamic Analysis ◮ Concolic or dynamic symbolic execution ◮ Incomplete static analysis to speed up runtime monitoring Immersion in IDEs ◮ Use machine idle time while user deliberates ◮ Improved usability 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 6
Current Trends More complex properties, often combine behavior and data ◮ Integration tasks (web interfaces, frameworks, APIs, . . . ) ◮ Security-related: information flow ◮ Evolution: regression, certification, . . . Combine Static and Dynamic Analysis ◮ Concolic or dynamic symbolic execution ◮ Incomplete static analysis to speed up runtime monitoring Immersion in IDEs ◮ Use machine idle time while user deliberates ◮ Improved usability Resource Analysis Resource management and deployment become separate development phases 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 6
Related Fields Static Analysis of Programs is a not a Separate Discipline Cross cutting with many other fields, distinction is blurry ◮ Type Theory ◮ Abstract Interpretation ◮ Model Checking ◮ Software Verification 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 7
Program Analysis: The Two Worlds The Two Worlds Meet ◮ Glassbox ◮ Blackbox ◮ Symbolic ◮ Statistical ◮ Heuristic ◮ Complete ◮ Analysis ◮ Synthesis ◮ Static Analyses ◮ Learning Techniques ◮ Model checking ◮ Extract Behavior from Traces ◮ Abstract interpretation ◮ Learning-based Software Testing ◮ Symbolic execution ◮ Learning-based synthesis ◮ Deductive verification ◮ 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 8
Program Analysis: The Two Worlds Advantages ◮ Precise, rich modelling ◮ Source code not needed ◮ Executable/compilable target ◮ Applicable to any system level ◮ Can be scalable ◮ Fully automatic ◮ Certificates possible ◮ Robust 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 8
Program Analysis: The Two Worlds Disadvantages ◮ Must have/generate source code ◮ Learned models very abstract ◮ Where do the specs come from? ◮ How to map abstract to code level? ◮ Some expert interaction necessary ◮ No use of symbolic techniques ◮ Evolution of target expensive ◮ Slow convergence, small coverage ◮ Incompleteness, soundiness ◮ Doesn’t scale/not compositional 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 8
Software Model ⇔ Executable Code Pivotal Issue: The Link between Models and Code Too tight: Need to hand-craft modelling abstractions Too loose: Unsatisfactory precision/coverage 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 9
Software Model ⇔ Executable Code Pivotal Issue: The Link between Models and Code Too tight: Need to hand-craft modelling abstractions Too loose: Unsatisfactory precision/coverage Increase elasticity of model-code link without sacrificing precision 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 9
Software Model ⇔ Executable Code Pivotal Issue: The Link between Models and Code Too tight: Need to hand-craft modelling abstractions Too loose: Unsatisfactory precision/coverage Increase elasticity of model-code link without sacrificing precision Potential Benefits ◮ Decrase dependency of glassbox from availability of specs, source code ◮ Integrate blackbox into precise/sound(y) reasoning framework ◮ Dramatically improve performance of both glass-/blackbox 160425 | TUD CS SE | R. Hähnle | Static Analysis | Dagstuhl 16172 ML & SA | 9
Recommend
More recommend