excel on the java vm
play

Excel on the Java VM Generating Fast Code from Spreadsheet Models - PowerPoint PPT Presentation

Excel on the Java VM Generating Fast Code from Spreadsheet Models Peter Arrenbrecht codewise.ch / Abacus Research AG Submission ID: 30 AGENDA > Problem: Customization Is Hard > Idea: Let Users Use Spreadsheets > Implementation:


  1. Excel on the Java VM Generating Fast Code from Spreadsheet Models Peter Arrenbrecht codewise.ch / Abacus Research AG Submission ID: 30

  2. AGENDA > Problem: Customization Is Hard > Idea: Let Users Use Spreadsheets > Implementation: Abacus Formula Compiler 2

  3. It's All About Customization > Calculation of key values varies – From user to user – Over time > Examples – Prices and conditions – Shipping rates – Salary components Variations in employment terms   Variations in local regulations – Asset valuations – Insurance premiums – Risk estimates – ... 3

  4. Traditional Customization Approaches > Custom development (options targeted at needs of specific customers) Hard to maintain due to code and data bloat  Reduced quality in UI, documentation and testing (“it's only for a single customer”)  Costly and slow for customer when he needs a change  > Parametrization (options trying to capture reusable aspects of specific needs)  Often hard to discover and configure; complex interplay Testing nightmare due to endless configuration combinations  Hard to maintain due to code and data bloat  Still costly and slow when customer needs unanticipated feature  > Expression builder (generic expressions with access to relevant input values)  Very costly to implement with sufficient flexibility, usability and quality Many implementations basically just support defining one-liners  Can still lead to testing nightmare  Can be slow when used heavily  > Hard on both you and your customers 4

  5. A Typical Expression Builder > Abacus Payroll: Definition of salary components (Lohn 411) > Fairly generic expression builder Basic set of operators, structured multi-line expressions   Can reference input values and results of other calculations > Nice for simple cases Can define simple one-liners quickly  > Hard for complex cases Grouping? If-then-else? Debugging? Documenting? Change control?  Table lookup 5

  6. Trying To Support Complex Cases > Abacus Payroll: Definition of lookup tables (Lohn 421) Criteria can reference inputs or defined expressions  So can values  > Shortcomings Separate UI   To and fro with expressions Again no visual feedback etc.  6

  7. Fixing the Expression Builder: For Users > Modelling power Complex composition of basic functions (advanced functions are rarely needed)  Named intermediate values to structure and to reuse code  Multiple distinct cases with complex decision logic  Lookup tables (price/rebate by category/threshold) with computed entries  > Debugging / Verification See effects of different sample input values to gain confidence in correctness  Trace effects through intermediate values  Trace and debug computations on actual data sets  Allow to record test set of input/output values to be verified after system updates  > Documentation Annotate intermediate steps with to-the-point explanatory text  Just a plain-text field requires too much duplication  > Versioning Can keep textual log of changes, but preferably keep log of old versions around 

  8. Solution: Spreadsheets (Excel, OpenOffice Calc) > Familiar to most users (at least those who would do customization) > Very good modelling power Handling multiple distinct cases can be a little non-visual, but manageable  > Very interactive Immediate feedback  Dependency tracing  What-if  Goal finding  > Can be documented Annotate individual cells  Use colors and fonts  > Can be tracked Keep versions of files around at will (file system, blobs, version control)  Compare versions of files  Integrate changelog 

  9. But we need to add > A way to specify input and output values Especially a way to discover available inputs and expected outputs  – Ideas  Excel-/OpenOffice-Plugin Decoupled: Users name input/output cells arbitrarily, then they can define bindings  for these names in a separate step in your app > Access to actual input values for debugging > Running tests with sets of input and expected output values

  10. Salary Component Revisited > Abacus supports same calculcation in a spreadsheet Immediate feedback  Everything at a glance  Keep notes   Keep versions

  11. Example > Abacus Service and Contract Management > Calculate the price of an Abacus ERP module – Lots of compositions of simple functions – Lots of decisions (IF) – Lots of lookup tables – Integrated documentation Changelog  Notes on individual cells  – Use of colors and fonts

  12. Fixing the Expression Builder: For Developers > Integrating the expression builder Access to input values and provided output values is an external API  Needs all the care that should always go into an external API  Decoupling, stability, discoverability, documentation  > Testing the expression evaluation engine  Usage patterns not predictable Need very high code coverage  > Testing its use in the application  Users can access input values in any order Keep input value accessors side-effect free   Then testing input value accessors in isolation should do > Performance Usually not critical at first, when used in interactive applications   But can quickly become so for batch updates

  13. Solution: Abacus Formula Compiler (AFC) > Compiles formulaic content of spreadsheets to plain Java classes http://www.formulacompiler.org/  Supports Excel and OpenOffice formats (.xls, .xlsx, .ods)  Supports a number of numeric data types (double, BigDecimal, scaled long)  > Compiler and compiled classes run on a plain JRE  Does not require Excel or OpenOffice to compile or run engines Generates bytecode directly, so does not need a full JDK (tools.jar)  But integrates a decompiler so you can look at generated code  > Performs well Compiled classes are JITted like any other Java class   Tuned for heavy use of a bunch of computations Not for thousands of different computations each used rarely  Cheap instantiation supports thread isolation via per-computation instances  Choice of stateless mode or stateful mode that caches intermediate cell values 

  14. AFC continued (1) > Integrates well with application code Pulls input values from specified application-specific Java interfaces  Supplies output values as implementations of application-specific Java interfaces  So can provide custom computation strategy implementations in addition to  classically coded default ones Binding of cells to input and output values can be fully controlled  Optional simple by-name binding of input and output cells  > Handles repeating data Designated spreadsheet areas are treated as repeating sections  Iterates over iterables and arrays 

  15. Example > Compiling the Abacus ERP module price calculation – Simple setup – Uses and implements our own interfaces – Can switch to BigDecimal – Decompile to see generated Java code

  16. AFC continued (2) > Modular design – Compact runtime for loading and running precompiled engines – Support for different spreadsheet formats – Build spreadsheet models in- memory to replace home- grown expression engines > Extensively tested and documented > Available under GPL and commercially

  17. AFC Internals > Compile-time constant folding using interpreter > Template methods – Used by interpreter to compute subexpressions – Decompiled at build-time to generate methods emitting equivalent byte-code > High-level functional language – To implement complex operations Define SUM in terms of iteration and +  – Interpreted by interpreter – Compiled to low-level constructs by compiler > Yes, we have developer documentation! http://www.formulacompiler.org/contribute/hacking/index.htm 

  18. Summary > Users need to customize calculations > Use spreadsheets as their UI Well known, interactive, traceable   Needs good application integration > Abacus Formula Compiler was built for this Integrates efficiently with your interfaces  Compiles to fast code  Can use BigDecimal for financial applications   No Excel, OpenOffice, nor even a JDK needed In production use at Abacus Research AG   Extensive tests Comprehensive documentation  GPL and commercial version available  > Still requires discipline to remain testable! Spreadsheets are free of side-effects – your APIs should be too  > http://www.formulacompiler.org/

  19. Peter Arrenbrecht http://arrenbrecht.ch/ codewise.ch peter@arrenbrecht.ch Abacus Formula Compiler http://formulacompiler.org/ Abacus Research AG http://abacus.ch/

Recommend


More recommend