design patterns for large scale data
play

Design Patterns for Large Scale Data Movement Aaron Lee - PowerPoint PPT Presentation

Design Patterns for Large Scale Data Movement Aaron Lee aaron.lee@solacesystems.com Data Movement Patterns o The right solution depends on the problem youre solving Real-time or intermittent? Update rates? Any weird networks?


  1. Design Patterns for Large Scale Data Movement Aaron Lee aaron.lee@solacesystems.com

  2. Data Movement Patterns o The right solution depends on the problem you’re solving ‐ Real-time or intermittent? ‐ Update rates? ‐ Any weird networks? ‐ Fan-in or fan-out? ‐ Acceptable latency? ‐ Payload size? ‐ Humans or machines? ‐ Guarantee required? http://www.dreamstime.com/stock-images-wispy-blue-spirals-pattern-image1983204 2

  3. Latency Required o Some not sensitive at all ‐ Batch updates Required Latency o Seconds often good enough Low as ‐ Database sync Possible ‐ User interfaces Not Critical o Others measure in milli- or micro-seconds ‐ Algo trading ‐ Industrial controls 3

  4. Network Distance o Co Co-lo locatio ion for r max sp speed Network Distance ‐ Minimize speed of light Global WAN o LAN for many apps ‐ 10GigE networks Co-location o Long distance WAN ‐ Expensive, limited pipes ‐ Creates mismatches with other networks 4

  5. Number of Messages o Few ‐ Batch updates Number of Messages ‐ Simple applications OMG o Moderate ‐ Risk management Whatever ‐ Order routing o Insane ‐ Market data ‐ Click stream analysis 5

  6. Degree of Distribution o Point-to-point o Fan-out (many subs) o Fan-in (many pubs) o Mesh ‐ Synching data between 1:1 many endpoints Millions of Endpoints Degree of Distribution 6

  7. Message Size o Small ‐ Status updates, activity logging events o Medium ‐ Orders, product BOMs o Large ‐ Batch updates, media files, Small product catalogs o Very different stresses on system based on message Huge size and frequency. Size of Messages 7

  8. Importance of Delivery Guarantee o “Best effort” fine for some scenarios o Others require “ once and only once” o Sequence matters for some Not o Some demand failsafe even in DR scenarios Very Delivery Guarantee Importance 8

  9. Other Considerations o Message o Robustness ‐ Format ‐ Archival ‐ Protocol ‐ Caching ‐ Structured/Unstructured ‐ Acceptable MTBF o Network ‐ HA switchover times ‐ DR requirements ‐ Availability ‐ RTT ‐ Bandwidth cost 9

  10. Combination of Factors Yields Design Patterns o Some attributes tend to Network Distance correlate Required Number of ‐ # of messages and Latency Messages degree of distribution o Others usually contradict ‐ Network distance and latency ‐ Guarantee and latency o Tradeoffs and creative solutions Delivery Degree Guarantee of Distribution Importance Size of Messages 10

  11. Identifying Patterns in Real-World Use Cases Use cases unique, Examples in this section: but patterns emerge Trade Order Flow Manufacturing Data Sync Oil and Gas Monitoring Real Time Sports Betting http://www.dreamstime.com/stock-images-wispy-blue-spirals-pattern-image1983204 11

  12. Order Flow o Latency matters, but Network Distance not every microsecond Required Number of o Usually localized Latency Messages o Continuous, high-rate message flow o Mid-sized messages (1-2Kb) o Messages absolutely must be guaranteed Delivery Degree Guarantee of Distribution Importance Size of Messages 12

  13. Order Flow; Architecture Smart Back Management Order Office & Monitoring Router Applications Slow Subscribers Real Message Bus Time Sync Disaster Client Exchange Recovery Site Gateways Gateways Clients Exchanges 13

  14. Order Flow; Similar Use Cases Need a way to correlate which use case is which color on the chart. o Credit card processing Network Distance ‐ Long-distance WANs Required Number of ‐ latency in hundreds of milliseconds Latency Messages o E-commerce ‐ Higher volumes ‐ Higher guarantee required o Logistics scheduling ‐ Less latency sensitive Delivery Degree Guarantee of Distribution ‐ More likely to include WANs Importance Size of Messages 14

  15. Manufacturing Data Sync Build from the background image on prior slide o Geographically distributed Network Distance o 100% delivery guarantee Required Number of Latency Messages required o Data rate is use case specific – will assume lots of medium (< 5K) messages. o Number of endpoints use case specific, assume 10 manufacturing locations Delivery Degree Guarantee of Distribution Importance Size of Messages 15

  16. Manufacturing Data Sync; Architecture Applications & Databases Fanout at Edge Smart Buffering Maximizing Bandwidth 16

  17. Manufacturing Data Sync; Similar Use Cases o Real Time Risk Management Network Distance ‐ Smaller messages Required Number of ‐ Latency more important Latency Messages o Retail Global Inventory ‐ Messages can be larger ‐ Distribution can be more o Real Time Financials ‐ Messages larger Delivery Degree ‐ Distribution less Guarantee of Distribution Importance (collecting to 1 location) Size of Messages 17

  18. Oil & Gas Pipeline Monitoring o Wifi, Satellite, proprietary and Network Distance other unreliable networks Required Number of o Degree of distribution off the Latency Messages charts. In this case, fan-in. o Messages usually pretty small, unless batch o Latency unimportant o Level of guarantee use case specific, assume status Delivery Degree Guarantee of Distribution messages (ie. guarantee not Importance essential) Size of Messages 18

  19. Oil & Gas Pipeline Monitoring; Architecture Pipeline Sensors Wireless Collection Caches Unreliable Networks Message Bus Big Data Loading Real Time vs. Delayed Analytics Analytics Big Data & Engines Databases 19

  20. Oil & Gas Pipeline Monitoring; Similar Use Cases o Smart Grid Network Distance ‐ Small messages Required Number of ‐ Massive distribution Latency Messages o Transportation Monitoring ‐ Fewer endpoints ‐ Bigger messages o Retail Point of Sale ‐ More predictable networks Delivery Degree ‐ Guarantee more important Guarantee of Distribution Importance Size of Messages 20

  21. Real-Time Sports Betting o Huge message volumes Network Distance (in this case fan-out) Required Number of o Low level of guarantee for any Latency Messages one outbound message o High level of guarantee for inbound messages o Tiny messages o Network is the internet + mobile carriers Delivery Degree Guarantee of Distribution Importance o Latency (beyond network latency) is important Size of Messages 21

  22. Real-Time Sports Betting; Architecture Highlight the degree of fan Mobile Data Clickstream Odds & Customers Streaming out, connection counts, & Marketing Analytics Streaming Big event logging, real time Odds Data Data analysis for odds adjustment Low Huge Connection Latency Counts Message Bus Web Customers Customer & Security & Betting Apps Fraud Detection 22

  23. Real-Time Sports Betting; Similar Use Cases o Mobile Social Updates Network Distance ‐ Latency less important Required Number of ‐ Distribution far greater Latency Messages o Real Time Travel Alerting ‐ Each message more important ‐ Volumes much lower o Market Data Distribution ‐ Latency even more important Delivery Degree Guarantee of Distribution ‐ Volumes often much higher Importance ‐ Loss often tolerable Size of Messages 23

  24. Network Distance Required Number of Latency Messages Delivery Degree of Guarantee Distribution Importance Size of Messages

  25. Summary Questions? 25

Recommend


More recommend