HotCloud’17 Lube : Mitigating Bottlenecks in Hao Wang* Wide Area Data Analytics Baochun Li i Qua
Wide Area Data Analytics DC Master Namenode Workers Datanodes 2
Wide Area Data Analytics Why wide area data analytics? • Data Volume • User Distribution • Regulation Policy … DC #1 DC #2 DC #n … Problems Master Workers Workers • Widely shared resources … Namenode Datanodes Datanodes ‣ Fluctuating available provision • Distributed runtime environment ‣ Heterogenous utilizations 2
Fluctuating WAN Bandwidths 10.6.3.3 (VC) 500 10.8.3.3 (CT) 10.12.3.32 (TR) 10.4.3.5 (WT) 10.2.3.4 (TR) 400 Bandwidth (Mbps) 300 200 100 0 0:00 6:00 12:00 18:00 0:00 6:00 12:00 Jan 1 Jan 2 Measured by iperf on SAVI testbed https://www.savinetwork.ca/ 3
Heterogenous Memory Util Nodes in di ff erent DCs may have di ff erent resource utilizations 0.4 node_1 0.2 node_2 0.0 node_3 0.2 node_4 0.4 1 301 601 901 1201 1501 1801 2101 Time (s) Running Berkeley Big Data Benchmark on AWS EC2 4 nodes across 4 regions. Collected by jvmtop 4
Runtime Bottlenecks Bottlenecks emerges at runtime Fluctuation Heterogeneity • Any time • Any nodes Bottlenecks • Any resources Data analytics performance • Long completion times • Low resource utilization • Invalid optimization 5
Optimization of Data Analytics Existing optimization method does not consider runtime bottlenecks • Clarient [OSDI’16] considers the heterogeneity of available WAN bandwidth • Iridium [SIGCOMM’15] trades o ff between time and WAN bandwidth usage • Geode [NSDI’15] saves WAN usage via data placement and query plan selection • SWAG [SoCC’15] reorders jobs across datacenters “Much of this performance work has been motivated by three widely-accepted mantras about the performance of data analytics — network , disk and straggler .” Making Sense of Performance in Data Analytics Frameworks NSDI’15, Kay Ousterhout 6
Mitigating Bottlenecks at Runtime Mitigating bottlenecks • How to detect bottlenecks? • How to overcome the scheduling delay? • How to enforce the bottleneck mitigation? Resource queue Task queue in bottleneck 7
Architecture of Lube Lube Client Lube Master Bottleneck Detector Bottleneck Info. Cache Three major components Online Bottleneck Detector Available Worker Pool • Performance monitors Training Model (worker, intensity) • Bottleneck detecting module Pool Update • Bottleneck-aware scheduler Lube Scheduler Lightweight Performance Monitors Submitted Task Queue Network I/O JVM Bottleneck-aware Disk I/O more metrics Scheduling 8
Detecting Bottlenecks — ARIMA y t = θ 0 + φ 1 y t − 1 + φ 2 y t − 2 +…+ φ p y t − p + ε t − θ 1 ε t − 1 − θ 2 ε t − 2 − … − θ q ε t − q θ φ ε Ramdon error y t Current state Coe ffi cients input output Historical Autoregressive (AR) + Current states Moving Average(MA) state (time_1, mem_util) (time_2, mem_util) ARIMA(p, d, q) (time_t, mem_util) … (time_t-1, mem_util) 9
Detecting Bottlenecks — HMM Hidden Markov Model • Hidden states: O past future • Observation states: Q t • Emission probability: A A(a ij ) A(a ij ) … … Q q j q j q 1 q 1 q 2 q 2 q i q i • Transition probability: B B(b j ( k )) B(b j ( k )) To make HMM online … O k O k O 1 O 1 O 2 O 2 O d O d O Sliding Hidden Markov Model {time_stamp: mem, net, cpu, disk} • A sliding window for new observations • A moving average approximation for outdated observations 10
Bottleneck-Aware Scheduling Memory utilization of executor processes Built-in task schedulers: • Data-locality Network utilization of datanode processes Bottleneck-aware scheduler: • Data-locality • Bottlenecks at runtime CPU utilization of executor processes A single worker node is bottlenecked continuously while Disk (SSD) utilization of datanode processes all nodes are rarely bottlenecked at the same time 11 Time (s)
Implementation & Deployment Implementation • Spark-1.6.1 (scheduler) APIs: • redis database (cache) Master Node Lube Scheduler • Python scikit-learn, Keras (ML) HGET worker_id time Master Redis Server HSET worker_id {time: {metric: val_ob, val_inf}} Deployment Bottleneck Detection Module Worker Nodes • 37 EC2 m4.2xlarge instances SUBSCRIBE metric_1 metric_2 … Worker Redis Server • 9 regions PUBLISH + HSET • Berkeley Big Data Benchmark metric {time: val} … nethogs jvmtop iotop (e.g, iotop {time: I/O}) • An 1.1 TB dataset 12
Evaluation — Accuracy ARIMA SlidHMM 100 100 Calculation Hit Rate (%) Hit Rate (%) 80 80 hitrate = #((time, detection) ∩ (time, observation)) 60 60 Query-1 Query-2 #(time, detection) a b c a b c 100 100 ARIMA ignores nonlinear patterns Hit Rate (%) Hit Rate (%) 80 80 60 60 Query-3 Query-4 a b c 13
Evaluation — Completion Times Pure Spark Lube-SlidHMM Lube-ARIMA 1.0 1.0 Query-1 Query-2 Task completion times 0.5 0.5 Average 75th Time (ms) Time (ms) 0 0 0 5 0 5 0 5 0 5 0 0 Lube-ARIMA 12.454s 22.075s 1 1 1 1 × × × × 2 4 2 4 1.0 1.0 Query-3 Query-4 Lube-SlidHMM 14.783s 27.469s 0.5 0.5 Time (ms) Time (ms) 0 0 0 5 0 5 0 5 0 5 0 0 1 1 1 1 × × × × 2 4 2 4 14
Evaluation — Completion Times Pure Spark Lube-ARIMA Query completion times ARIMA + Spark Lube-SlidHMM SlidHMM + Spark • Lube-ARIMA • Lube-SlidHMM 1600 1600 • Reduce median query response 1400 1400 Time (s) time by up to 33% 1200 1200 1000 1000 Query-1 Query-2 Control Groups for overhead 1800 1800 • ARIMA + Spark 1600 1600 Time (s) • SlidHMM + Spark 1400 1400 1200 • Negligible overhead 1200 1000 Query-3 Query-4 1000 800 15
Conclusion • Runtime performance bottleneck detection ‣ ARIMA, HMM • A simple greedy bottleneck-aware task scheduler ‣ Jointly consider data-locality and bottlenecks • Lube , a closed-loop framework mitigating bottlenecks at runtime. 16
The End Thank You
Discussion Bottleneck detection models • More performance metrics could be explored • More e ffi cient models for time series prediction, e.g ., Reinforcement Learning, LSTM Bottleneck-aware scheduling • Fine-grained scheduling with specific resource awareness WAN conditions • We measure pair-wise WAN bandwidths by a cron job running iperf locally • Try to exploit support from SDN interfaces 18
Recommend
More recommend