Excel on the Java VM Generating Fast Code from Spreadsheet Models Peter Arrenbrecht codewise.ch / Abacus Research AG Submission ID: 30
AGENDA > Problem: Customization Is Hard > Idea: Let Users Use Spreadsheets > Implementation: Abacus Formula Compiler 2
It's All About Customization > Calculation of key values varies – From user to user – Over time > Examples – Prices and conditions – Shipping rates – Salary components Variations in employment terms Variations in local regulations – Asset valuations – Insurance premiums – Risk estimates – ... 3
Traditional Customization Approaches > Custom development (options targeted at needs of specific customers) Hard to maintain due to code and data bloat Reduced quality in UI, documentation and testing (“it's only for a single customer”) Costly and slow for customer when he needs a change > Parametrization (options trying to capture reusable aspects of specific needs) Often hard to discover and configure; complex interplay Testing nightmare due to endless configuration combinations Hard to maintain due to code and data bloat Still costly and slow when customer needs unanticipated feature > Expression builder (generic expressions with access to relevant input values) Very costly to implement with sufficient flexibility, usability and quality Many implementations basically just support defining one-liners Can still lead to testing nightmare Can be slow when used heavily > Hard on both you and your customers 4
A Typical Expression Builder > Abacus Payroll: Definition of salary components (Lohn 411) > Fairly generic expression builder Basic set of operators, structured multi-line expressions Can reference input values and results of other calculations > Nice for simple cases Can define simple one-liners quickly > Hard for complex cases Grouping? If-then-else? Debugging? Documenting? Change control? Table lookup 5
Trying To Support Complex Cases > Abacus Payroll: Definition of lookup tables (Lohn 421) Criteria can reference inputs or defined expressions So can values > Shortcomings Separate UI To and fro with expressions Again no visual feedback etc. 6
Fixing the Expression Builder: For Users > Modelling power Complex composition of basic functions (advanced functions are rarely needed) Named intermediate values to structure and to reuse code Multiple distinct cases with complex decision logic Lookup tables (price/rebate by category/threshold) with computed entries > Debugging / Verification See effects of different sample input values to gain confidence in correctness Trace effects through intermediate values Trace and debug computations on actual data sets Allow to record test set of input/output values to be verified after system updates > Documentation Annotate intermediate steps with to-the-point explanatory text Just a plain-text field requires too much duplication > Versioning Can keep textual log of changes, but preferably keep log of old versions around
Solution: Spreadsheets (Excel, OpenOffice Calc) > Familiar to most users (at least those who would do customization) > Very good modelling power Handling multiple distinct cases can be a little non-visual, but manageable > Very interactive Immediate feedback Dependency tracing What-if Goal finding > Can be documented Annotate individual cells Use colors and fonts > Can be tracked Keep versions of files around at will (file system, blobs, version control) Compare versions of files Integrate changelog
But we need to add > A way to specify input and output values Especially a way to discover available inputs and expected outputs – Ideas Excel-/OpenOffice-Plugin Decoupled: Users name input/output cells arbitrarily, then they can define bindings for these names in a separate step in your app > Access to actual input values for debugging > Running tests with sets of input and expected output values
Salary Component Revisited > Abacus supports same calculcation in a spreadsheet Immediate feedback Everything at a glance Keep notes Keep versions
Example > Abacus Service and Contract Management > Calculate the price of an Abacus ERP module – Lots of compositions of simple functions – Lots of decisions (IF) – Lots of lookup tables – Integrated documentation Changelog Notes on individual cells – Use of colors and fonts
Fixing the Expression Builder: For Developers > Integrating the expression builder Access to input values and provided output values is an external API Needs all the care that should always go into an external API Decoupling, stability, discoverability, documentation > Testing the expression evaluation engine Usage patterns not predictable Need very high code coverage > Testing its use in the application Users can access input values in any order Keep input value accessors side-effect free Then testing input value accessors in isolation should do > Performance Usually not critical at first, when used in interactive applications But can quickly become so for batch updates
Solution: Abacus Formula Compiler (AFC) > Compiles formulaic content of spreadsheets to plain Java classes http://www.formulacompiler.org/ Supports Excel and OpenOffice formats (.xls, .xlsx, .ods) Supports a number of numeric data types (double, BigDecimal, scaled long) > Compiler and compiled classes run on a plain JRE Does not require Excel or OpenOffice to compile or run engines Generates bytecode directly, so does not need a full JDK (tools.jar) But integrates a decompiler so you can look at generated code > Performs well Compiled classes are JITted like any other Java class Tuned for heavy use of a bunch of computations Not for thousands of different computations each used rarely Cheap instantiation supports thread isolation via per-computation instances Choice of stateless mode or stateful mode that caches intermediate cell values
AFC continued (1) > Integrates well with application code Pulls input values from specified application-specific Java interfaces Supplies output values as implementations of application-specific Java interfaces So can provide custom computation strategy implementations in addition to classically coded default ones Binding of cells to input and output values can be fully controlled Optional simple by-name binding of input and output cells > Handles repeating data Designated spreadsheet areas are treated as repeating sections Iterates over iterables and arrays
Example > Compiling the Abacus ERP module price calculation – Simple setup – Uses and implements our own interfaces – Can switch to BigDecimal – Decompile to see generated Java code
AFC continued (2) > Modular design – Compact runtime for loading and running precompiled engines – Support for different spreadsheet formats – Build spreadsheet models in- memory to replace home- grown expression engines > Extensively tested and documented > Available under GPL and commercially
AFC Internals > Compile-time constant folding using interpreter > Template methods – Used by interpreter to compute subexpressions – Decompiled at build-time to generate methods emitting equivalent byte-code > High-level functional language – To implement complex operations Define SUM in terms of iteration and + – Interpreted by interpreter – Compiled to low-level constructs by compiler > Yes, we have developer documentation! http://www.formulacompiler.org/contribute/hacking/index.htm
Summary > Users need to customize calculations > Use spreadsheets as their UI Well known, interactive, traceable Needs good application integration > Abacus Formula Compiler was built for this Integrates efficiently with your interfaces Compiles to fast code Can use BigDecimal for financial applications No Excel, OpenOffice, nor even a JDK needed In production use at Abacus Research AG Extensive tests Comprehensive documentation GPL and commercial version available > Still requires discipline to remain testable! Spreadsheets are free of side-effects – your APIs should be too > http://www.formulacompiler.org/
Peter Arrenbrecht http://arrenbrecht.ch/ codewise.ch peter@arrenbrecht.ch Abacus Formula Compiler http://formulacompiler.org/ Abacus Research AG http://abacus.ch/
Recommend
More recommend