Design Patterns for Large Scale Data Movement Aaron Lee aaron.lee@solacesystems.com
Data Movement Patterns o The right solution depends on the problem you’re solving ‐ Real-time or intermittent? ‐ Update rates? ‐ Any weird networks? ‐ Fan-in or fan-out? ‐ Acceptable latency? ‐ Payload size? ‐ Humans or machines? ‐ Guarantee required? http://www.dreamstime.com/stock-images-wispy-blue-spirals-pattern-image1983204 2
Latency Required o Some not sensitive at all ‐ Batch updates Required Latency o Seconds often good enough Low as ‐ Database sync Possible ‐ User interfaces Not Critical o Others measure in milli- or micro-seconds ‐ Algo trading ‐ Industrial controls 3
Network Distance o Co Co-lo locatio ion for r max sp speed Network Distance ‐ Minimize speed of light Global WAN o LAN for many apps ‐ 10GigE networks Co-location o Long distance WAN ‐ Expensive, limited pipes ‐ Creates mismatches with other networks 4
Number of Messages o Few ‐ Batch updates Number of Messages ‐ Simple applications OMG o Moderate ‐ Risk management Whatever ‐ Order routing o Insane ‐ Market data ‐ Click stream analysis 5
Degree of Distribution o Point-to-point o Fan-out (many subs) o Fan-in (many pubs) o Mesh ‐ Synching data between 1:1 many endpoints Millions of Endpoints Degree of Distribution 6
Message Size o Small ‐ Status updates, activity logging events o Medium ‐ Orders, product BOMs o Large ‐ Batch updates, media files, Small product catalogs o Very different stresses on system based on message Huge size and frequency. Size of Messages 7
Importance of Delivery Guarantee o “Best effort” fine for some scenarios o Others require “ once and only once” o Sequence matters for some Not o Some demand failsafe even in DR scenarios Very Delivery Guarantee Importance 8
Other Considerations o Message o Robustness ‐ Format ‐ Archival ‐ Protocol ‐ Caching ‐ Structured/Unstructured ‐ Acceptable MTBF o Network ‐ HA switchover times ‐ DR requirements ‐ Availability ‐ RTT ‐ Bandwidth cost 9
Combination of Factors Yields Design Patterns o Some attributes tend to Network Distance correlate Required Number of ‐ # of messages and Latency Messages degree of distribution o Others usually contradict ‐ Network distance and latency ‐ Guarantee and latency o Tradeoffs and creative solutions Delivery Degree Guarantee of Distribution Importance Size of Messages 10
Identifying Patterns in Real-World Use Cases Use cases unique, Examples in this section: but patterns emerge Trade Order Flow Manufacturing Data Sync Oil and Gas Monitoring Real Time Sports Betting http://www.dreamstime.com/stock-images-wispy-blue-spirals-pattern-image1983204 11
Order Flow o Latency matters, but Network Distance not every microsecond Required Number of o Usually localized Latency Messages o Continuous, high-rate message flow o Mid-sized messages (1-2Kb) o Messages absolutely must be guaranteed Delivery Degree Guarantee of Distribution Importance Size of Messages 12
Order Flow; Architecture Smart Back Management Order Office & Monitoring Router Applications Slow Subscribers Real Message Bus Time Sync Disaster Client Exchange Recovery Site Gateways Gateways Clients Exchanges 13
Order Flow; Similar Use Cases Need a way to correlate which use case is which color on the chart. o Credit card processing Network Distance ‐ Long-distance WANs Required Number of ‐ latency in hundreds of milliseconds Latency Messages o E-commerce ‐ Higher volumes ‐ Higher guarantee required o Logistics scheduling ‐ Less latency sensitive Delivery Degree Guarantee of Distribution ‐ More likely to include WANs Importance Size of Messages 14
Manufacturing Data Sync Build from the background image on prior slide o Geographically distributed Network Distance o 100% delivery guarantee Required Number of Latency Messages required o Data rate is use case specific – will assume lots of medium (< 5K) messages. o Number of endpoints use case specific, assume 10 manufacturing locations Delivery Degree Guarantee of Distribution Importance Size of Messages 15
Manufacturing Data Sync; Architecture Applications & Databases Fanout at Edge Smart Buffering Maximizing Bandwidth 16
Manufacturing Data Sync; Similar Use Cases o Real Time Risk Management Network Distance ‐ Smaller messages Required Number of ‐ Latency more important Latency Messages o Retail Global Inventory ‐ Messages can be larger ‐ Distribution can be more o Real Time Financials ‐ Messages larger Delivery Degree ‐ Distribution less Guarantee of Distribution Importance (collecting to 1 location) Size of Messages 17
Oil & Gas Pipeline Monitoring o Wifi, Satellite, proprietary and Network Distance other unreliable networks Required Number of o Degree of distribution off the Latency Messages charts. In this case, fan-in. o Messages usually pretty small, unless batch o Latency unimportant o Level of guarantee use case specific, assume status Delivery Degree Guarantee of Distribution messages (ie. guarantee not Importance essential) Size of Messages 18
Oil & Gas Pipeline Monitoring; Architecture Pipeline Sensors Wireless Collection Caches Unreliable Networks Message Bus Big Data Loading Real Time vs. Delayed Analytics Analytics Big Data & Engines Databases 19
Oil & Gas Pipeline Monitoring; Similar Use Cases o Smart Grid Network Distance ‐ Small messages Required Number of ‐ Massive distribution Latency Messages o Transportation Monitoring ‐ Fewer endpoints ‐ Bigger messages o Retail Point of Sale ‐ More predictable networks Delivery Degree ‐ Guarantee more important Guarantee of Distribution Importance Size of Messages 20
Real-Time Sports Betting o Huge message volumes Network Distance (in this case fan-out) Required Number of o Low level of guarantee for any Latency Messages one outbound message o High level of guarantee for inbound messages o Tiny messages o Network is the internet + mobile carriers Delivery Degree Guarantee of Distribution Importance o Latency (beyond network latency) is important Size of Messages 21
Real-Time Sports Betting; Architecture Highlight the degree of fan Mobile Data Clickstream Odds & Customers Streaming out, connection counts, & Marketing Analytics Streaming Big event logging, real time Odds Data Data analysis for odds adjustment Low Huge Connection Latency Counts Message Bus Web Customers Customer & Security & Betting Apps Fraud Detection 22
Real-Time Sports Betting; Similar Use Cases o Mobile Social Updates Network Distance ‐ Latency less important Required Number of ‐ Distribution far greater Latency Messages o Real Time Travel Alerting ‐ Each message more important ‐ Volumes much lower o Market Data Distribution ‐ Latency even more important Delivery Degree Guarantee of Distribution ‐ Volumes often much higher Importance ‐ Loss often tolerable Size of Messages 23
Network Distance Required Number of Latency Messages Delivery Degree of Guarantee Distribution Importance Size of Messages
Summary Questions? 25
Recommend
More recommend