monolithic batch goes microservice streaming
play

Monolithic Batch Goes Microservice Streaming A story about one - PowerPoint PPT Presentation

Monolithic Batch Goes Microservice Streaming A story about one transformation Charles Tye & Anton Polyakov Who are We? What We Do Develop solutions for Market Risk Credit Risk Liquidity Risk Stress Testing Messaging Together with


  1. Monolithic Batch Goes Microservice Streaming A story about one transformation Charles Tye & Anton Polyakov

  2. Who are We? What We Do Develop solutions for Market Risk Credit Risk Liquidity Risk Stress Testing Messaging Together with around 70 other people from Anton Polyakov Charles Tye all over the world Head of Application Head of Core Services Development & Risk IT 2 years in Nordea 17 years in Nordea 3 •

  3. Market Risk The high level view Quantify potential losses and exposures Do many small risks add up to a big risk? Can risks combine in unusual and unexpected ways? 4 •

  4. Market Risk Line of Defence Independent function Protect Nordea and our customers Daily internal reporting and external reporting to regulators Control of risk Analysis and insight into the sources of risk Management of capital 5 •

  5. Examples of Risk Analysis Value at Risk Look at last 2 years of market history Simulate if the same thing happened again today. Average of the worst 1% of outcomes Highly non-linear but requirement to drill in and find the drivers 6 •

  6. Examples of Risk Analysis Stress Scenarios “Black Swan” worst case scenarios Unexpected outcomes from future events Simulate if it happened Example: Brexit 7 •

  7. An Interesting Technology Problem Risk Analysis: Everything has to be included Reactive near = know when you are complete real-time calculations Consistent Risk does not sum Streaming data over hierarchies Fast corrections Speed Non-linear and “what-if” Drill-down is non trivial Interactive Volume sub-second Traditional OLAP queries on huge aggregate & data sets increment doesn’t work 10,000,000 ,000,000 8 •

  8. Challenge No 1. Spaghetti Find the seams Break it up Reusable components Replace a piece at a time 9 •

  9. Challenge No 2. Develop a new service Integrate into the legacy system Reconcile the output Find and fix legacy bugs Fight complification 10 •

  10. Challenge No 3. Batch is synchronous state transfer. The only way to achieve consistency? Event sourced and streaming approach More robust, scalable and faster, especially for recovery Comes with a cost Consistency is seriously hard to combine with streaming 11 •

  11. Challenge No 4. Legacy SQL was slow Replace with in-memory aggregation Aggregate billions of scenarios in-memory and pre-compute total vectors over hierarchies (linear) Non-linear measures computed lazily Partitions and horizontally scales out across commodity hardware. Reactive and continuous Tougher challenges on terabyte-scale hardware due to NUMA limitations. Some queries cubes already > 200gb and larger ones planned. 12 •

  12. Solution: Microservices! Well almost … Single responsibility – replace pieces of legacy from the inside out Self contained with business functional boundaries • Independent and rapid development – team owns the whole stack • Organisationally scalable – horizontally scale your teams Flexible and maintainable – evolve the architecture Smart endpoints and dumb pipes Innovation and short lifecycles 13 •

  13. The problem • Business: • Multi-model Market Risk calculator for Nordea portfolio • VaR on different organization levels with 5-6 different models in parallel • IT: • 7000 CPU hours of grid calculation • More than 4000 SQL jobs Graph with more than 10000 edges • • Nightly batch flow 14 •

  14. How did it look like? • Well, you know. 10 years of development • In SQL • No refactoring (who needs it?) 15 •

  15. Precisely, how did it look? 16 •

  16. Logical architecture Monolith staged app 17 •

  17. Now a little of complication Can be parallel? Sloo-o-o-ow Fat. So it breaks 18 •

  18. So what to do? We all know the answer probably (since we are at this section ☺ ) - Find logically isolated blocks - Keep an eye on non-functional aspect - Think of how they communicate - Think about what happens if something dies 19 •

  19. Not quite a “classical” microservices…or? produce enrich aggregate - Request/response is not feasible - Synchronous interaction is too long - Some results are expensive to reproduce 20 •

  20. So we need… A middleware which - “Glues” services together - Caches important results - Serves as a coordinator and work distributor 21 •

  21. Queues and sets Distributed locks Fast pub/sub pull and dedup Scale out 22 •

  22. Locks? Who needs locks? Queues and sets Distributed locks Fast pub/sub pull and dedup Scale out 23 •

  23. Pub/sub messaging as notifier Producer Enricher Aggregator store store store Redis pub/sub consumer 24 •

  24. But… There are two main problems in distributed messaging: 2) Guarantee that each message is only delivered once 1) Guarantee messages order 2) Guarantee that each message is only delivered once 25 •

  25. Enricher Queues with atomic operations Incoming queue Producer BRPOPLPUSH Enricher Processing queue store Redis pub/sub 26 •

  26. Sets and Hmaps – all good for dedup In eventually consistent world dedup is your best friend Enricher Multiple inserts due to recovery store - HSET Consistent state due to dedup 27 •

  27. So how to scale out? logically concurrently Enricher Aggregator Enricher Aggregator <type A> <day 1> Enricher Aggregator <type B> <day 2> <type X> <day 3> Steal work Filter my events Redis pub/sub RedLock + TTL 28 •

  28. Demo Producer Enricher Aggregator RedLock + TTL Incoming queue store store store Processing queue Redis pub/sub consumer 29 •

  29. The Result and What We Learned Success! • Aggregate and produce risk: 5 hours → 30 mins • Corrections: 40 mins → 1 second • Earlier deliveries – more time to manage the risks • Faster recovery from problems • Happy risk managers Important (and painful) to integrate new services into the existing system Consistency is hard to combine with streaming (subject of another talk maybe) When distributing remember first law of distributed objects architecture (do you remember it?) 30 •

  30. The Result and What We Learned First Law of Distributed Object Design: "don't distribute your objects" 31 •

  31. And of course… https://dk.linkedin.com/in/charles-tye-a8aa88b https://github.com/parallelstream/ 32 •

Recommend


More recommend