Scala For Business Automation Solving Real Business Problems with Streams, POI and a Dash of BDD Conor Svensson conor10@gmail.com
The Spreadsheet The PC’s first killer app
The London Whale …the model operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another… Report of JPMorgan Chase & Co. Management Task Force Regarding 2012 CIO Losses
A brief history… • 1979 VisiCalc - The first killer app for PCs on Apple II • 1983 Lotus 1-2-3 • 1985 Excel 1.0 on Apple • 1987 Excel 2.0 on MS-DOS • 1992 Excel 3.0 on Windows • 1994 StarCalc (Star Office) • 2001 Calc (Open Office) • 2006 Google Spreadsheets • 2007 iWork Numbers
Today
Use cases • Traditional • Accounting/inventory • Project management • Simple data analysis • Reporting • Exotic • Business process workflows • Risk management • Transaction capture • Reconciliations
Why? And then… Then… Initially… “I can solve this problem It becomes business Management/audit/ with a spreadsheet” critical regulators arrive… Avoid more • “W hy do we have • expensive build via these decentralised a proper solution business critical (bureaucratic spreadsheets?” change processes don’t help here) “We need a proper • solution” Lack of • development knowledge
Complex spreadsheets • Multiple lookup tables with reference data • Data from multiple data sources • Series of complex calculations • Output reports • Centralised via a file share (if lucky), multiple copies tend to exist due to e-mail
The result
Automation
1. Apache POI • Poor Obfuscation Implementation • The Java API for Microsoft Documents • Work with Office files without a license (Excel, Word, PowerPoint, Outlook, Visio, Publisher) • Text extraction applications - web spiders, index builders, content management systems • Business automation…
2. Scala • Expressive and concise • Streams, for comprehensions, and case classes • Many organisations have not yet made the switch to Java 8 • Slots easily into existing Java dominant environments (e.g. financial institutions)
Streams
For comprehension Instead of: We can use:
Case classes Scala provides: • field accessors • constructors • equals • hashCode • toString • copy
3. BDD in Scala Examples use ScalaTest’s FlatSpec with Matchers
An example Q: Is there a correlation between Bitcoin and commodity returns?
Source data • Bitcoin prices • ASX SPI 200 (Australian commodities index)
Our model • Reorder data • Create named ranges • Merge price series by date • Calculate: 1. Returns of our price series 2. Mean 3. Standard deviation 4. Covariance 5. Correlation
The Code ExcelReader • Use named ranges or sheet/row/col indices Stats • Statistical library Correlation • Incremental extraction of business logic with Streams/for-comprehensions
ExcelReader
Stats
Correlation
End to end validation Tests that all can understand!!!
Clear test output
Building on this • Rapid prototyping/MVP • Spreadsheet used as initial datasource • Micro-spreadsheet-services, etc… (sorry)
Limitations • VBA evaluations are not supported, only formula evaluations are • Some Excel functions are not yet implemented (but you can contribute easily) • It’s not a silver bullet for avoiding spreadsheets, merely a tool to ease the pain of moving off them • Additional overhead of more granular stream processing
In summary • Useful tools for rapid prototyping/MVP without risking data quality • Reduced implementation error • Spreadsheet can be modelled incrementally • BDD test cases verified against original source • Use FP techniques • Very well suited to streams • Promoting immutability of our data structures
Further resources • https://github.com/conor10/spreadsheets - source code and example code supporting this talk • https://www.quandl.com - Quandl for data • https://poi.apache.org/ - Apache POI Project • http://scalatest.org/ - ScalaTest • Power, D. J., "A Brief History of Spreadsheets", DSSResources.COM, World Wide Web, http://dssresources.com/history/sshistory.html, version 3.6, 08/30/2004 • http://blogs.office.com/2012/09/13/introducing-spreadsheet-controls-in-the- new-office/ - Controls for critical spreadsheets • http://fsprojects.github.io/ExcelProvider/ - F# Excel Type Provider
Questions?
Recommend
More recommend