Auto-sizing for Stream Processing Applications at LinkedIn Rayman Preet Singh, Bharath Kumarasubramanian, Prateek Maheshwari, and Samarth Shetty Stream Processing @ LinkedIn
Stream Processing Skills Top App Skills Jobs Streaming input, nearline processing 2
Stream Processing App Example John Doe Samza: Stateful Scalable Stream Processing at LinkedIn Proc. VLDB ‘ 17 3
Profile Service Mini-profile Service DB Service Graph … Service … Feed Front-end Service … Profile URN Resolution Service Service Notification … Service Real-time distributed tracing for web performance and efficiency optimizations Stream Processing App Example LinkedIn Engineering Blog 4
Stream Processing App Example LinkedIn Sales and EMEA Blog 5
Stream Processing at LinkedIn Skills Top App Skills Jobs At LinkedIn, thousands of apps Notification, monitoring, recommendation, fraud-detection, search, … Millions of messages/s, 100s of GBs/s, … 6
Stream Processing at LinkedIn App-2 App-1 App-3 Stream Processing as a Service App developers APIs Data scientists Capacity provisioning … Security & privacy Operational ease Scalability Fault-tolerance Efficiency Performance … 7
Problem Throughput, Latency Parallelism CPU-cores, #threads, … Memory Heap, native, … Specialized hardware GPUs, RDMA, … Over-provisioning 50% of users by approx. 50%, Google-Autopilot [EuroSys’20], … Under-provisioning OOMs, stalls, failures, under-performing, ... 8
Solution Sizing parameters App Controller Throughput, Latency goals Input load App internals Environmental conditions Dependency-service, network latencies, … Hardware, software evolution … 9
Existing Solutions Apps are DAGs of cataloged operators Filter SoCC ‘17, VLDB ‘17, ToN ‘17, OSDI ‘18, ICDE ‘15, ICDE ‘20, IC2E ’16, … Join Tune parallelism Filter Map Filter Filter Optimize throughput, latency, utilization, time-taken Arrival rates, service-times follow specific distributions ToN ’17, ICDE ’15, … Poisson, exponential, … Tune parallelism – queuing theory, hill-climb, … 10
Apps use remote services Web Service UDF CDF of service time (ms) 1 Blob Storage 0.8 Op3 0.6 App 1 App 2 0.4 KV App 3 0.2 Op2 Store Op1 App 4 0 . 1 10 100 1000 . Service time (in ms) . Service time depends on remote services’ latencies, error-rates & retries, network latencies, … No specific distribution of service-times 11
Apps use remote services Web Service UDF Input load (messages per sec) Blob Storage Op3 KV Op2 Store Op1 . . Time-series of input load for sample apps . Throughput depends on input load variation and remote services’ throughput 12
CDF of arrival-rate (messages/sec) 1 App 1 0.8 App 2 App 3 0.6 App 4 0.4 0.2 0 10 5 10 6 Arrival rate (messages/sec) No specific distributions of arrival-rates 13
Apps go beyond a DAG of operators Web Service External UDF Frameworks Additional functionalities External frameworks Blob Op3 TensorFlow, DL4j, … Storage Client Periodic Out-of-order processing Cache UDF Input priorities KV Op2 Op1 State Store State User-defined functions (UDFs) . Customized input checkpointing . . … 14
Apps go beyond a DAG of operators Heterogenous c ombinations of functionalities DAG-only based models are insufficient 15
Apps exhibit correlations in resource use CPU bottleneck à Input buffering à Memory use à Lowered throughput à … Java-based apps Apache Flink, Samza, … Memory bottleneck à GC overhead à Low throughput, High latency & CPU utilization 16
Long-tail distributions of app characteristics CDF of application p50 service time (ms) Fraction of applications 1 0.8 0.6 0.4 0.2 0 0.01 0.1 1 10 100 1000 10000 100000 Application p50 service time (ms) 17
Long-tail distributions of app characteristics CDF of number of input streams (per application) Fraction of applications 1 0.8 0.6 0.4 0.2 0 1 10 100 1000 Number of input streams 18
Long-tail distributions of app characteristics CDF of application state size (MB) Fraction of applications 1 0.8 0.6 0.4 0.2 0 1000 10000 100000 1x10 6 1x10 7 0.1 1 10 100 Application state (in MB) 19
Requirements Sizing parameters Right size vs. optimal size Operational ease Interpretable App Controller Safe-trajectory Minimize time-taken Scalable, fault-tolerant, efficient, … 20
Approach Black-box approaches Azure-VMSS, AWS-EC2 autoscale, Dhalion VLDB ’17, .. Interpretable Right sizing Time-taken, oscillations [DS2 OSDI’18, Turbine ICDE ’20] Undo, redo, refine, … 21
Approach Optimization approaches Bilal et al. SoCC ‘17, Gencer et al. Middleware ‘15, … Training data (trial runs), parameter & criteria tuning, assumptions, … Optimal sizing, minimize time-taken Operability (interpretable actions), service dependencies, network, .. 22
Sage Design Feedback control system Policies encapsulate strategies for sizing a single resource Priority order Periodically on all apps Only if, no inflight action on app 23
Sage Design Policy priority order Deterministic -- interpretable, modifiable, .. Programmability for policies Tailored to continuous-operator systems like Apache Samza, Flink, … P1: Memory scale-up P2: CPU scale-up P3: Parallelism tuning … 24
Sage Design Straggling app Web Service External Increase memory? CPU? Parallelism? UDF Frameworks Blob Op3 Bounded buffers Storage Client Periodic Cache UDF Tuning memory before CPU KV Op2 Op1 Store State . Tuning parallelism . . P1: Memory scale-up Triggered by backlog increases (after P1, P2) P2: CPU scale-up Correlation with remote service metrics? P3: Parallelism tuning TLCC (time-lagged cross-correlation) … 25
Implementation Work in progress Implemented as a stream processing app Used for hundreds of production mix of apps Avg. approx. 30 mins for new apps 14% larger size vs. hand-tuned optimal (selected apps) At-most one scale-down for each resource 26
Conclusion Resource sizing is crucial for any service’s performance, usability, operability, ... Streaming apps go beyond DAG of operators Use r emote services Customize functionalities, heterogeneous Widely varying workloads Multiple resource-use, performance, cost, operability trade-offs Sage: a rule-based solution to navigate them in production 27
Backup slides 28
Long-tail distributions of app characteristics 29
Apps use remote services CDF of service time (ms) 1 0.8 Web 0.6 App 1 Service App 2 0.4 App 3 0.2 UDF App 4 0 1 10 100 1000 Blob Service time (in ms) Storage Op3 CDF of arrival-rate (messages/sec) 1 App 1 0.8 KV App 2 App 3 Op2 Store Op1 0.6 App 4 0.4 . 0.2 . 0 . 10 5 10 6 Arrival rate (messages/sec) Service time depends on remote services’ latencies, error-rates & retries, network latencies, … No specific distribution Throughput depends on input load variation and remote services’ throughput No specific distribution 30
Apps go beyond a DAG of operators Web Service External UDF Frameworks Additional functionalities External frameworks Blob Op3 TensorFlow, DL4j, … Storage Client Periodic Out-of-order processing Cache UDF Input priorities KV Op2 Op1 State Store State User-defined functions (UDFs) . Customized input checkpointing . . … Apps combine operators and functionalities in different ways Heterogenous mix 31
Apps exhibit correlations in resource use CPU bottleneck à Input buffering à Memory use à Lowered throughput à … Java-based apps Apache Flink, Samza, … Memory bottleneck à GC overhead à Low throughput, High latency & CPU utilization 32
Sage Design Feedback control system Policies encapsulate strategies for sizing a single resource Priority order Periodically on all apps Only if, no inflight action on app 33
Sage Design Policy priority order Deterministic -- interpretable, modifiable, .. Programmability for policies Tailored to continuous-operator systems like Apache Samza, Flink, … P1: Memory scale-up P2: CPU scale-up P3: Parallelism tuning … 34
Work in Progress Used for hundreds of production mix of apps Avg. approx. 30 mins for new apps 14% larger size vs. hand-tuned optimal (selected apps) At-most one scale-down for each resource 35
Summary Resource sizing is crucial for any service’s performance, usability, operability, ... Streaming apps go beyond DAG of operators Use r emote services Customize functionalities, heterogeneous Widely varying workloads Multiple resource-use, performance, cost, operability trade-offs Sage: a rule-based solution to navigate them in production 36
Long-tail distributions of app characteristics CDF of application p50 service time (ms) Fraction of applications CDF of number of input streams (per application) Fraction of applications 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 1 10 100 1000 0.01 0.1 1 10 100 1000 10000 100000 Number of input streams Application p50 service time (ms) CDF of application state size (MB) Fraction of applications 1 0.8 0.6 0.4 0.2 0 1000 10000 100000 1x10 6 1x10 7 0.1 1 10 100 Application state (in MB) 37
Recommend
More recommend