graph based analysis of javascript source code
play

Graph-based analysis of JavaScript source code repositories Gbor - PowerPoint PPT Presentation

Graph-based analysis of JavaScript source code repositories Gbor Szrnyas Graph Processing devroom @ FOSDEM 2018 JAVASCRIPT Latest standard: ECMAScript 2017 STATIC ANALYSIS Static source code analysis is a software testing approach


  1. Graph-based analysis of JavaScript source code repositories Gábor Szárnyas Graph Processing devroom @ FOSDEM 2018

  2. JAVASCRIPT Latest standard: ECMAScript 2017

  3. STATIC ANALYSIS  Static source code analysis is a software testing approach performed without compiling and executing the program itself. Codacy, Static CodeClimate, analysis etc. Version Control Development System Unit and Compilation integration tests

  4. STATIC ANALYSIS TOOLS  C  JavaScript o lint -> linters o ESLint o Facebook Flow  Java o Tern.js o FindBugs o TAJS o PMD

  5. PERFORMANCE CONSIDERATIONS  Checking global rules is computationally expensive  Slow for large projects, difficult to integrate even to CI ☆ ☾ ☆ ☼ Unit tests Code analysis  Workaround #1: no global rules (ESLint)  Workaround #2: batching (e.g. 1/day) Unit tests Code analysis  Workaround #3: custom algorithms (e.g. Flow)

  6. PROJECT GOALS Goal  Static analysis for JavaScript applications Design considerations  Custom analysis rules o Both global and local o Extensible  High-performance o “real - time” responses

  7. ARCHITECTURE AND WORKFLOW

  8. PROPOSED APPROACH Design considerations  Custom analysis rules  High-performance Δ 2.-1. 1. Approach analyzer  Use a declarative query language  Use incremental processing o in lieu of batch execution o file-granularity o maintain results

  9. ARCHITECTURE VCS Workspace Abstract Syntax Abstract Semantic Tree Graph . +--- discoverer +--- ChangeProcessor.js +--- CommandParser.js . +--- FileIterator.js +--- iterators +-------DepCollector.js +-------FileDiscoverer.js +-------InitIterator.js Main.js | ++---- +--- Main.js +--- whitepages Dependency.js . | +++++- +--- ConnectionMgr.js . Fiterator.js . | ---- +--- DependencyMgr .js . Parser.js | ++ Analysis rules Analysis server Graph database <!> <?> Validation report <.> Client

  10. CODE PROCESSING STEPS CODE forráskód code tokenizer tokenizer a sequence of statements: tokenek tokens parser parser var foo = 1 / 0 AST AST scope analyzer scope analyzer ASG ASG

  11. CODE PROCESSING STEPS TOKENS tokens : the shortest meaningful character sequence code var foo = 1 / 0 tokenizer tokens Token Token type parser var VAR (Keyword) foo IDENTIFIER (Ident) AST = ASSIGN (Punctuator) 1 NUMBER (NumericLiteral) scope analyzer / DIV (Punctuator) ASG 0 NUMBER (NumericLiteral)

  12. CODE PROCESSING STEPS AST Abstract Syntax Tree o Tree representation of Module o the grammar structure of items code VariableDeclarationStatement o sequence of tokens. declaration tokenizer VariableDeclaration declarators tokens VariableDeclarator binding init parser BindingIdentifier BinaryExpression name = "foo" operator = "Div" AST left right scope analyzer LiteralNumericExpression LiteralNumericExpression value = 1.0 value = 0.0 ASG

  13. CODE PROCESSING STEPS ASG Abstract Semantic Graph o Not necessarily a tree astNode Module Module GlobalScope o Has scopes & items items code children semantic info declaration declaration variables tokenizer o Cross edges declarators declarators tokens references binding binding declarations init init parser AST left right left right node scope analyzer ASG

  14. AST VS. ASG var foo = 1 / 0 1 LOC -> 20+ nodes

  15. PATTERN MATCHING  Declarative graph patterns with Cypher VariableDeclarator binding be BindingIdentifier BinaryExpression name = "foo" operator = "Div" Match result LNExpression right LNExpression value = 1.0 value = 0.0 MATCH (binding:BindingIdentifier) <-[:binding]-()--> (be:BinaryExpression) -[:right]->(right:LNExpression) WHERE be.operator = 'Div' AND right.value = 0.0 RETURN binding

  16. WORKFLOW Version source code control system tokenizer tokens transformation Developer’s IDE graph parser database AST scope analyzer traceability ASG Java, Cypher Neo4j Git, Visual Studio Code ShapeSecurity Shift

  17. USE CASES TYPE INFERENCING function foo(x, y) { return (x + y); } function bar(a, b) { return foo(b, a); } var quux = bar("goodbye", "hello"); Source: http://marijnhaverbeke.nl/blog/tern.html

  18. USE CASES GLOBAL ANALYSIS Reachability:  dead code detection  async/await (ECMAScript 2017)  potential division by zero

  19. TECH DETAILS

  20. IMPORTS AND EXPORTS

  21. FIXPOINT ALGORITHMS  Lots of propagation algorithms  „ Run to completion ” scheduling o Mix of Java code and Cypher

  22. EFFICIENT INITIALIZATION  Initial build of the graph with Cypher was slow  Generate CSV and bulk load  Two files: nodes, relationships $NEO4J_HOME/bin/neo4j-admin import --database=db --nodes=nodes.csv --relationships=relationships.csv  10× speedup

  23. REGULAR PATH QUERIES  Transitive closure on certain combinations  Workaround: o Start transaction o Add proxy relationships o Calculate transitive closure o Rollback transaction A B *  openCypher proposal for path patterns (:A)-/[:R1 :R2 :R3]+/->(:B)

  24. INCREMENTAL QUERIES

  25. OPENCYPHER SYSTEMS  „ The openCypher project aims to deliver a full and open specification of the industry’s most widely adopted graph database query language: Cypher. ” (late 2015)  Research prototypes o Graphflow (Univesity of Waterloo) incremental processing o ingraph (incremental graph engine) (Source: Keynote talk @ GraphConnect NYC 2017)

  26. FOSDEM 2017: INGRAPH

  27. STATE OF INGRAPH IN 2018  Cover a substantial fragment of openCypher o MATCH , OPTIONAL MATCH , WHERE o WITH , functions, aggregations o CREATE , DELETE  Features on the roadmap o MERGE , REMOVE , SET o List comprehensions J. Marton, G. Szárnyas, D. Varró: Formalising openCypher Graph Queries in Relational Algebra, ADBIS, Springer, 2017 G. Szárnyas: Incremental View Maintenance for Property Graph Queries, SIGMOD SRC, 2018

  28. RELATED PROJECTS

  29. JQASSISTANT Code comprehension: software to graph Dirk Mahler, Pushing the evolution of software analytics with graph technology, Neo4j blog, 2017

  30. SLIZAA slizaa uses Neo4j/jQAssistant and provides a front end with a bunch of specific tools and viewers to provide an easy-to-use in-depth insight of your software's architecture. Gerd Wütherich, Core concepts, slizaa

  31. SLIZAA: ECLIPSE IDE

  32. SLIZAA: XTEXT OPENCYPHER  Xtext-based grammar  Used in the ingraph compiler  Now has a scope analyzer  Works in the Eclipse IDE and web UI

  33. WRAPPING UP

  34. PUBLICATIONS Dániel Stein: Graph-based source code analysis of JavaScript repositories, Master ’s thesis, 2016 Soma Lucz: Static analysis algorithms for JavaScript, Bachelor’s thesis, 2017

  35. CONCLUSION  Some interesting analysis rules require a global view of the code  Good use case for graph databases o Property graph o Cypher language  Very good use case for incremental queries o Incrementality on multiple levels

  36. RELATED RESOURCES Codemodel-Rifle github.com/ftsrg/codemodel-rifle ingraph engine github.com/ftsrg/ingraph Shape Security’s Shift parser github.com/shapesecurity/shift-java Slizaa openCypher Xtext github.com/slizaa/slizaa-opencypher-xtext Thanks to Ádám Lippai, Soma Lucz, Dániel Stein, Dávid Honfi and the ingraph team.

  37. Ω

  38. VISUAL STUDIO CODE INTEGRATION  Language Server Protocol (LSP) allows portable implementation

  39. USE CASES CFG  Control Flow Graph o graph representation of error o every possible statement statement sequence statement  Basis for type condition if inferencing and statement statement test generation statement done statement

Recommend


More recommend