STAR: Self-Tuning Aggregation for Scalable Monitoring [On job market next year] Navendu Jain, Dmitry Kit, Prince Mahajan, Praveen Yalagandula † , Mike Dahlin, and Yin Zhang University of Texas at Austin † HP Labs
Motivating Application Network traffic monitoring: Detect Heavy Hitters 0.1% threshold \ Traffic Stream Frequency Counts Identify flows that account for a significant fraction (say 0.1%) of the network traffic 2
Global Heavy Hitters Distributed Heavy Hitter detection • Monitor flows that account for a significant fraction of traffic across a collection of routers Node 1 0.1% + Frequencies threshold + + Node N Flows Aggregate Sum 3
Broader Goal Scalable Distributed Monitoring • Monitor, query, and react to changes in global state - Examples: Network monitoring, Grid monitoring, Job scheduling, Efficient Multicast, Distributed quota management, sensor monitoring and control, ... Financial apps Grids Multicast Sensor Networks IP Traffic Quota Management 4
System Model Adaptive filters [Olston SIGMOD ’03], Astrolabe [VanRenesse TOCS ’03], Key Challenges TAG [Madden OSDI ’02], TACT [Yu TOCS ’02] Large-scale: nodes, attributes (e.g., flows) Arithmetic query approximation Robustness to dynamic workloads • Exact query answers are not needed! Cost of adjustment • Trade accuracy for communication/processing cost Monitor Query (S 1 ,…,S m ) Coordinator Adjust Push updates update S 1 S m Filters Data outside Streams range 5
Our Contribution: STAR A scalable self-tuning algorithm to adaptively set the accuracy of aggregate query results • Flexible precision-communication cost tradeoffs Approach • Aggregation Hierarchy - Split filters flexibly across leaves, internal nodes, root • Workload-Aware Approach - Use variance, update rate to compute optimal filters • Cost-Benefit Analysis - Throttle redistribution 6
Talk Outline Motivation STAR Design Aggregation Hierarchy Self-Tuning Filter Budgets Estimate Optimal Budgets Cost-Benefit Throttling Evaluation and Conclusions
Background: Aggregation PIER [Huebsch VLDB ‘03], SDIMS [Yalagandula SIGCOMM ’04], Astrolabe [VanRenesse TOCS ’03], TAG [Madden OSDI ’02] Fundamental abstraction for scalability • Sum, count, avg, min, max, select, ... • Summary view of global state • Detailed view of nearby state and rare events 37 L3 18 19 L2 SUM 7 11 7 12 L1 3 4 2 9 6 1 9 3 L0 Physical Nodes (Leaf sensors) 8
Setting Filter Budgets Guarantees • Given an error budget , report a range s.t. (1) H L (2) δ root δ self δ root ( self ) L3 δ c 1 δ c 1( self ) δ c 2( self ) δ c 2 L2 δ c 3 δ c 4 δ c 4 δ c 5 δ c 6 L1 L0 9
Aggregation Hierarchy [6,11] δ root = 5 Node R [4+3, 6+4] Node A Node B [4,6] [3,4] 10
Aggregation Hierarchy Filtered [6,11] δ root = 5 Node R [4+4, 6+5] [4,5] Node A Node B Update 6 [4,6] 5 [3,4] [4,5] Sent Filtered 11
Talk Outline Motivation STAR Design Aggregation Hierarchy Self-Tuning Error Budgets Estimate Optimal Budgets Cost-Benefit Throttling Evaluation and Conclusions
How to Set Budgets? Goal: Self-tuning Ideal distribution • Send budget to where filtering needed/effective - Large variance of inputs --> Require more budget to filter - Higher update rate of inputs --> Higher load to monitor 13
Self-tuning Budgets: Single Node δ ≤ σ δ > σ Message Load Error Budget Quantify filtering gain • Chebyshev’s inequality • Expected message cost M ( δ ) = 14
Self-tuning Budgets: Hierarchy Single-level tree • Estimate optimal filter budget - Optimization problem: Min. msg cost under fixed budget - Solution: δ T M( δ n ) Expected msg cost M( δ 1 ) M( δ 2 ) … Filter budgets δ c1 δ c2 δ cn u 1 u 2 u n Update rate 15
Talk Outline Motivation STAR Design Aggregation Hierarchy Self-Tuning Filter Budgets Estimate Optimal Budgets Cost-Benefit Throttling Evaluation and Conclusions
Redistribution Cost Monitor Query (S 1 ,…,S m ) Coordinator Adjust S 1 S m Filters Data Streams 17 5
When to Redistribute Budgets? Total Load Message Load Redistribution Load Monitoring Load Frequency of Budget Distribution More frequent redistribution • More closely approx. ideal distribution (current load) • Heavier redistribution overhead 18
Cost-Benefit Throttling 1. Load Imbalance 2. Long-lasting Imbalance T current – T time_last_redist M( δ current ) – M( δ ideal ) Charge: (M( δ current ) – M( δ ideal )) * ( T current – T time_last_redist ) Rebalance if Charge > Threshold 19
Talk Outline Motivation STAR Design Aggregation Hierarchy Self-Tuning Filter Budgets Estimate Optimal Budgets Cost-Benefit Throttling Evaluation and Conclusions
Experimental Evaluation STAR prototype • Built on top of SDIMS aggregation [Yalagandula ‘04] • FreePastry as the underlying DHT [Rice Univ./MPI] • Testbeds - CS Department, Emulab, and PlanetLab Questions • Does arithmetic approximation reduce load? • Does self-tuning yield benefits and approximate ideal? 21
Methodology Simulations • Quantify load reduction due to self-tuning budgets under varying workload distributions App:Distributed Heavy Hitter detection (DHH) • Find top-100 destination IPs receiving highest traffic • Abilene traces for 1 hour (3 routers); 120 nodes - Netflow data logged every 5 minutes 22
Does Throttling Redistribution Benefit? 90/10 synthetic workload • Self-Tuning: Much better than uniform • Throttling: Adaptive filters [Olsten ‘03] wastes messages on useless adjustments A Uea U f id Ac c f cN m a f e ASN g-U a c m Til Message Cost per second STAR Adaptive filters 1.A 10x load reduction 1.1A STAR 1.11A 1.A A A1 23 Error Budget to Noise ratio
Does Self-Tuning Approximate Ideal? Uniform noise workload • Self-tuning approximates uniform allocation • Avoid useless readjustments 1 Adaptive Message Cost per second 0.1 filters 0.01 Uniform Uniform Allocation allocation Adap-filters (freq = 5) 0.001 Adap-filters (freq = 10) STAR Adap-filters (freq = 50) STAR 1e-04 0.1 1 10 100 24 Error Budget to Noise ratio Error Budget to Noise ratio
Abilene Workload 80K flows send about 25 million updates in 1 hr • Centralized server needs to process 7K updates/sec • Heavy tailed distribution 60% flows 99% flows 40% flows 99% flows send < 1KB send < 330KB send 1 IP pkt send < 2k pkt 100 100 CDF (% of flows) CDF (% of flows) 10 10 Flow value distribution Flow updates distribution 1 1 1 100 10000 1e+06 1 100 10000 25 Flow value (KB) Number of updates
DHH: Does Self-Tuning Reduce Load? Self-tuning significantly reduces load 7 msgs/node/sec 100 Message Cost per second 10 3x load 1 reduction 10x load 0.1 BW(Root_share=0%) reduction BW(Root_share=50%) BW(Root_share=90%) BW(Root_share=100%) 0.01 0 5 10 15 20 26 AI Error Budget (% max flow value)
STAR Summary Scalable self-tuning setting of filter budgets • Hierarchical Aggregation - Flexible divide budgets across leaves, internal nodes, root • Workload-Aware Approach - Use variance, update rate to estimate optimal budgets • Cost-Benefit Throttling - Send budgets where needed 27
Thank you! http://www.cs.utexas.edu/~nav/star nav@cs.utexas.edu 28
Recommend
More recommend