Program Analysis Program Analysis • Extracting information, in order to present Extracting static and dynamic information from a software system abstractions of, or answer questions about, a software system • Static Analysis: Examines the source code • Dynamic Analysis: Examines the system as it is executing What are we looking for? Entities • Depends on our goals and the system • Entities are individuals that live in the system, and attributes associated with them. • In almost any language, we can find out information about variable usage • Some examples: • In an OO environment, we can find out which classes use other • Classes, along with information about their superclass, their scope, classes, which are a base of an inheritance structure, etc. and where in the code they exist. • We can also find potential blocks of code that can never be • Methods/functions and what their return type or parameter list is, executed in running the program (dead code) etc. • Typically, the information extracted is in terms of entities and • Variables and what their types are, and whether or not they are relationships static, etc. Relationships Information format • Many different formats in use • Relationships are interactions between the entities • Simple but effective: RSF in the system. inherit TRIANGLE SHAPE • Relationships include: • TA is an extension of RSF that includes a schema • Classes inheriting from one another. • Methods in one class calling the methods of another class, and $INSTANCE SHAPE Class methods within the same class calling one another. • GXL is an XML-like extension of TA. A blow-up • A method referencing an attribute. factor of 10 or more makes it rather cumbersome
Static Analysis CppETS • Involves parsing the source code • CppETS is a benchmark for C++ extractors • Usually creates an Abstract Syntax Tree • It consists of a collection of C++ programs that • Borrows heavily from compiler technology but pose various problems commonly found in parsing stops before code generation and reverse engineering • Requires a grammar for the programming • Static analysis research tools typically get about language 60% of the problems right • Can be very difficult to get right Example program Example Q&A #include <iostream.h> class Hello { • How many member methods are in the Hello public: Hello(); ~Hello(); class? }; Two, the constructor Hello::Hello() and Hello::Hello() destructor Hello::~Hello() { cout << "Hello, world.\n"; } • Where are these member methods used? Hello::~Hello() The constructor is called implicitly when an { cout << "Goodbye, cruel world.\n"; } instance of the class is created. The destructor is main() { called implicitly when the execution leaves the Hello h; scope of the instance. return 0; } Static analysis in IDEs Static analysis pipeline • Eclipse displays compilation warnings and errors on the fly, e.g. unused variables • EiffelStudio automatically creates BON diagrams of the static structure of Eiffel systems • Rational Rose, as well as some Eclipse plugins, do the same with UML and Java • Reverse engineers have many other uses for static facts
Dynamic Analysis Instrumentation • Provides information about the run-time behaviour • Augments the subject program with code that of software systems, e.g. transmits events to a monitoring application, or • Component interactions writes relevant information to an output file • Event traces • A profiler can be used to examine the output file • Concurrent behaviour and extract relevant facts from it • Code coverage • Memory management • Instrumentation affects the execution speed and • Can be done with a profiler or a debugger storage space requirements of the system Instrumentation process Dynamic analysis pipeline Non-instrumented approach Dynamic analysis issues • One can also use debugger log files to obtain • Ensuring good code coverage is a key concern dynamic information • A comprehensive test suite is required to ensure • Disadvantage: Limited amount of information that all paths in the code will be exercised provided • Results may not generalize to future executions • Advantage: Less intrusive approach, more accurate performance measurements
Static vs. Dynamic SWAGKit • SWAGKit is used to generate software landscapes • Reasons over all • Observes a small from source code possible number of • Based on a pipeline architecture with three behaviours behaviours phases (general results) (specific results) • Extract (cppx, bfx, javex) • Manipulate (prep, linkplus, layoutplus) • Conservative • Precise and fast • Present (lsedit) • Challenge: • Challenge: Select • Currently usable for programs written in C/C++ Choose good representative test and Java abstractions cases The SWAGKit Pipeline CPPX • C/C++ fact extractor based on gcc • Extracts facts from one source file at a time • Facts represent program information in TA format, e.g. $INSTANCE x integer • Can pass normal gcc parameters using the -g option • In the assignment, we will see two other fact extractors, bfx and javex. They extract facts from compiled code, C and Java respectively. Prep Grok • Prep is a series of scripts written in Grok • A simple scripting language • Function is to “clean up” facts from cppx so they • A relational algebraic calculator are in a form which can be usable by the rest of • Powerful in manipulating binary relations the pipeline.
Grok Script (1) Grok Script (2) cat := {"Garfield", "Fluffy"} chase := cat X mouse chase mouse := {"Mickey", "Nancy"} eat := chase + mouse X cheese cheese := {"Roquefort", "Swiss"} eat animals := cat + mouse food := mouse + cheese animalsWhichAreFood := animals ^ food animalsWhichAreNotFood := animals - food animalsWhichAreFood animals - food #food mouse <= food Grok Scripts (3) A more real example Factbase rawFacts.rsf We need to compute call {"Mickey"} . eat relations between files contain a.c f1 eat . {"Mickey"} contain a.c f2 eater := dom eat contain b.c f3 contain b.c f4 food := rng eat call f1 f2 chasedBy := inv chase call f2 f3 topOfFoodChain := dom eat - rng eat call f3 f4 bottomOfFoodChain := rng eat - dom eat bothEatAndChase := eat ^ chase eatButNotChase := eat - chase chaseButNotEat := chase - eat secondOrderEat := eat o eat anyOrderEat := eat + A bigger real example linkplus containFacts := $1 Input: A nested getdb containFacts partition of a • Function is to link all facts into one large graph d := dom contain set of objects r := rng contain • Combines facts residing in separate files Output: A e := ent contain • Resolves inter-compilation unit relationships flattened roots := d - r version of the • Merges header files together leaves := r - d original partition • Does some cleanup to shrink final graph toKeep := roots + leaves toDelete := e - toKeep • Usage: linkplus list-of-files-to-link cc := contain+ delset toDelete • Produces out.ln.ta delrel contain contain := cc relToFile contain $2
layoutplus lsedit • Adds • Clustering of facts based on contain.rsf (created manually or from a • View software landscape produced by previous clustering algorithm) parts of the pipeline • Layout information so that graph can be displayed • Schema information • Can make changes to landscape and save them • Usage: • Usage: lsedit out.ls.ta layoutplus contain.rsf out.ln.ta • Produces out.ls.ta
Recommend
More recommend