reactive design patterns for microservices on multicore
play

Reactive design patterns for microservices on multicore Reactive - PowerPoint PPT Presentation

Reactive Software with elegance Reactive design patterns for microservices on multicore Reactive summit - 22/10/18 charly.bechara@tredzone.com Outline Microservices on multicore Reactive Multicore Patterns Modern Software Roadmap 2 1


  1. Reactive Software with elegance Reactive design patterns for microservices on multicore Reactive summit - 22/10/18 charly.bechara@tredzone.com

  2. Outline Microservices on multicore Reactive Multicore Patterns Modern Software Roadmap 2

  3. 1 MICROSERVICES ON MULTICORE 3

  4. Microservices on Multicore Microservice architecture with actor model µService µService Actor µService Message passing 4

  5. Microservices on Multicore Fast data means more inter-communication Batch Stream Computations Real time event processing Fast Data Highly Interconnected Workflows Communications 5

  6. Microservices on Multicore Microservice architecture µService µService Actor µService Message passing 6

  7. Microservices on Multicore Microservice architecture + Fast Data µService µService Actor µService Message passing New interactions 7

  8. Microservices on Multicore Microservice architecture + Fast Data µService µService Actor µService Message passing New interactions More interactions 8

  9. Microservices on Multicore More microservices should run on the same muticore machine machine 9

  10. Microservices on Multicore Microservice architecture + Fast Data + Multicore + machine µService µService Core 10

  11. Microservices on Multicore Microservice architecture + Fast Data + Multicore + machine Universal Law of Scalability (Gunther law) Performance model of a system based on queueing theory Perfect scalability ( N) Contention impact ( σ ) Coherency impact ( κ ) σ >>0, κ >0 σ >>0, κ =0 σ =0, κ =0 11

  12. Microservices on Multicore From inter-thread communications... Core 12

  13. Microservices on Multicore From inter-thread communications... Core 13

  14. Microservices on Multicore …to inter -core communications Core 14

  15. Microservices on Multicore Inter-core communication => cache coherency machine core 1 core N 1 cycle (0.3 ns) Registers Registers 4 cycles (1.3 ns) L1 I$ L1D$ L1 I$ L1D$ MESI > 600 600 cycles 12 cycles L2$ L2$ (200 ns) (4 ns) > 30 cycles > Shared L3$ or LLC (10 ns) Assuming Freq = 3 GHz 15

  16. Microservices on Multicore Exchange software are pushing performance to hardware limits machine 50%ile 99.99%ile Stability Velocity Volume k.msg/s M. msg/s msec µsec 16

  17. Simplx: one thread per core No context switching 17

  18. Simplx: actors multitasking per thread High core utilization 18

  19. Simplx: one event loop per core for communications Lock free Event loop = ~30 300 ns Event loop = ~30 300 ns ns Si Simplx runs on all cores 19

  20. Multicore WITHOUT multithreaded programming ? 20

  21. Microservices on Muticore Very good resources, but no multicore-related patterns machine 21

  22. 2 REACTIVE MULTICORE PATTERNS 22

  23. 23

  24. Reactive multicore Patterns 7 patterns to unleash multicore reactivity machine Core-to-core messaging (2 patterns) Core monitoring (2 patterns) Core-to-core flow control (1 pattern) Core-to-cache management (2 patterns) 24

  25. Core-to to-core messaging patterns 25

  26. Pattern #1: the core-aware messaging pattern Inter-core communication: push message ~1 µs – 10 µs Push message ~ 500 ns ~ 300 ns Pipe pipe = new Pipe ( greenActorId ); pipe.push<HelloEvent>(); sender destination core socket server 26

  27. Pattern #1: the core-aware messaging pattern Intra-core communication: push message ~300 ns Pipe pipe = new Pipe ( greenActorId ); pipe.push<HelloEvent>(); Push a message asynchronous 27

  28. Pattern #1: the core-aware messaging pattern Intra-core communication: x150 speedup with direct call over push ~300 ns Pipe pipe = new Pipe ( greenActorId ); pipe.push<HelloEvent>(); Push a message asynchronous ~2 ns ActorReference < GreenActor > target = getLocalReference ( greenActorId ); [...] target -> hello (); Direct call synchronous Optimize calls according to the deployment 28

  29. Pattern #2: the message mutualization pattern Network optimizations, core optimizations. Same fight In this use case, the 3 red consumers process the same data core actor push data 29

  30. Pattern #2: the message mutualization pattern Communication has a cost Many events means high cache coherence usage (L3) 3 events core actor push data 30

  31. Pattern #2: the message mutualization pattern Let’s mutualize inter-core communications 3 events 1 event 3 direct calls Local router 31

  32. Pattern #2: the message mutualization pattern WITH pattern vs WITHOUT pattern: Linear improvement 32

  33. Core monitoring patterns @ real-time 33

  34. Pattern #3: the core stats pattern Use case: monitoring the data distribution throughput We want to know in real-time the number of messages received per second, globally, and per core. StartSequence startSequence ; startSequence . addActor < RedActor >(0); // core 0 startSequence . addActor < RedActor >(0); // core 0 startSequence . addActor < RedActor >(1); // core 1 startSequence . addActor < RedActor >(1); // core 1 Simplx simplx ( startSequence ); core actor core actor data 34

  35. Pattern #3: the core stats pattern Use case: monitoring the data distribution throughput struct LocalMonitorActor : Actor { […] void newMessage () { ++ count ; 1 } } struct RedActor : Actor { […] ReferenceActor monitor ; RedActor () { monitor = newSingletonActor < LocalMonitorActor >(); } void onEvent () { monitor -> newMessage (); } } 1 Local monitoring Increase 1 Singleton message counter 35

  36. Pattern #3: the core stats pattern Use case: monitoring the data distribution throughput struct LocalMonitorActor : Actor , TimerProxy { […] LocalMonitorActor : TimerProxy (*this) { setRepeat ( 1000 ); } virtual void onTimeout () { 1 serviceMonitoringPipe . push < StatsEvent >( count ); count = 0 ; } } 1 sec ec 1 Inform monitoring of Timer Service monitoring the last second statistics 36

  37. Pattern #4: the core usage pattern Core utilization Detect overloading cores before it is too late Relying on the CPU usage provided by the OS is not enough 100% does not mean the runtime is overloaded 10% does not tell how much data you can really process 37

  38. Pattern #4: the core usage pattern No push, no event, no work 1 sec ec 20 loops in a second 0% core usage Reality is more about 3 millions loops per second Idle loop 38

  39. Pattern #4: the core usage pattern Efficient core usage 1 sec ec 20 loops in a second 0% core usage 11 loops = 3 working loops 60% core usage Idle loop Working loop 39

  40. Pattern #4: the core usage pattern Runtime performance counters help measurement Duration(IdleLoop) = 0.05 s Reality is more about Duration Idle loop ~300 ns CoreUsage = 1 – ∑( idleLoop)*0.05 100 idleLoop= 0|1 0 1 0 0 0 1 0 0 1 0 0 11 loops 8 idle loops 3 working loops 1 sec ec 60% core usage Idle loop Working loop Core usage actor 40

  41. Demo: Real-time core monitoring A typical trading workflow Data stream Data processing 41

  42. Core-to to-core flow control patterns 42

  43. Pattern #5: the queuing prevention pattern What if producers overflow a consumer ? Your software cannot be more optimized ? Still, the incoming throughput could be too high, implying strong queuing. Continue ? Stop the flow ? Merge data ? Throttling ? Whatever the decision, we need to detect the issue 43

  44. Pattern #5: the queuing prevention pattern What’s happening behind a push ? 44

  45. Pattern #5: the queuing prevention pattern Local Simplx loops handle the inter-core communication Batc atch ID = 145 145 45

  46. Pattern #5: the queuing prevention pattern Once the destination reads the data, the BatchID is incremented Batc atch ID = 145 145 Batc atch ID = 146 146 46

  47. Pattern #5: the queuing prevention pattern BatchID does not increment if destination core is busy Batc atch ID = 145 145 Batc atch ID = 145 145 47

  48. Pattern #5: the queuing prevention pattern Core to core communication at max pace Batc atch ID = 145 145 Batc atch ID = 145 145 BatchID batchID ( pipe ); pipe . push < Event >(); ( … ) if(batchID.hasChanged()) { // push again } else { //destination is busy //merging data, start throttling, reject orders … } 48

  49. Pattern #5: the queuing prevention pattern Demo: code java Same id Last id => queuing => no queuing 49

  50. Core-to to-cache management patterns 50

  51. Pattern #6: the cache-aware split pattern FIX + execution engine new w or orde der 51

  52. Pattern #6: the cache-aware split pattern FIX + execution engine A FIX order can easily size ~ 200 Bytes new w or orde der ack cknowledgment Almost all tags sent in the new order request need to be sent back in the acknowledgment 52

  53. Pattern #6: the cache-aware split pattern Stability depends on the ability to be cache friendly 20 200 Bytes To stay « in-cache » and get stable per per order: performance, one core can store ~1300 open orders. 1  10000 open orders per book Local storage order book Local storage 53

Recommend


More recommend