Sheriff: A Regional Pre-Alert Management Scheme in Data Center Networks Xiaofeng Gao, Wen Xu, Fan Wu, Guihai Chen and Ding-Zhu Du Shanghai Jiao Tong University
Background Data Center Networks
Background • Goal of a data center network management system is to be: – Stable – Effective – Robust • Several problems of existing management schemes: – Weakness of a centralized controller – Short-sighted mechanism
Introduction Centralization vs Distribution • Drawbacks of Centralization – Sharply increased response time – Upgraded or heterogeneous components • What we need – Distributed managers – Regional self-automatic control
Introduction Contingency vs Pre-Control • Drawbacks of Contingency – Working after detecting errors – Harmful to device prevention and system maintenance • What we need – Take early warnings – React in advance to avoid congestions
Introduction • Sheriff – Distributed (at end host side) – Pre-alert – Regional self-automatic • Two phases – PREDICTION • ALERT message – MANAGEMENT • VM migration
System Design Two kinds of graphs in a DCN 1. Wired Network Graph 2. Dependency Graph
Problem Formulation Problems & Solutions • Overloaded servers – Migrate VMs – Reshuffle VMs • Congestion – Check QCN – Modify the rate at end host
Problem Formulation Pre-alert & Actions • Monitored servers by shim – Information collection – Prediction by ARIMA model and NN model – Report ALERT value once it exceeds THRESHOLD • Monitored switches by shim – Flow congestion detection – Signal congestion flows
Problem Formulation Pre-alert & Actions • Alert from servers or from ToR switches – VM migration • Alert from outer switches – Flow reroute • Implement flow reroute first • VM migration – More expensive – Slower
Problem Formulation Cost of VM Migration • Six steps – Initialization – Reservation – Iterative copy – Stop&Copy – Commitment – Activation
Problem Formulation Cost of VM Migration • Cost of initialization • Cost of transmission – – Transmission time: – Utilization rate of the bandwidth: • Cost of dependency – – Unit cost per distance: – Physical distance of e:
Problem Formulation Cost of VM Migration • Total cost: • Gz: • Goal: minimize
Problem Formulation Pre-alert Mechanism • Collecting Necessary Information – Workload profile – Normalized to [0, 1]
Problem Formulation Pre-alert Mechanism • Time series prediction – Autoregressive Integrated Moving Average (ARIMA) • Modeling linear, dynamic signals – Nonlinear Autoregressive Neural Network (NARNET) • Modeling nonlinear, dynamic and chaotic • Dynamic Model Selection – For each method f – Choose method f which has the minimum value
Problem Formulation Alert Scheme • Seriousness of the condition • Collect ALERTs – VM Migration
Alert-Migration Algorithm Simplification of VM Migration Algorithm • Cost of migration • Simplification – First step – Second step • All pair shortest path problem • K-median problem – Only related with the source and destination of the migration
Alert-Migration Algorithm Framework • Running periodically T time • Each round – Collect alerts – Select a group of candidate VMs (as sources) – VM Migration • Not all VMs are migrated – Parameters as portion for migration
Alert-Migration Algorithm Select Subroutine • Remove delay-sensitive flows • Pick up as many VMs with lowest value – Dynamic Knapsack Algorithm • If the priority parameter is one – Pick VM with highest ALERT value
Alert-Migration Algorithm Migration • Find optimal pairs: K-center problem – Local Search Algorithm – It has an approximation ratio 3 + 2 / p with time complexity O(n^p) • ACKs from the destination’s delegation node – Enough capacity • REJECTs from the destination’s delegation node – Recalculate possible migration destinations
Evaluation Network Trace Training • Data from ZopleCloud Corp. • Combined model has a smaller minimum square error.
Evaluation Simulation For VM Migration • Fat-tree & Bcube • Workload balancing
Evaluation Simulation For VM Migration • Result & Time Complexity
Conclusion • Sheriff: A fast distributed pre-alert manage- ment scheme in data center network – Monitor locally – Predict possible ALERTs – Apply Flow Reroute / VM migration • Evaluation – Accuracy of the prediction – Efficiency of the migration algorithm
• Thanks for your attention!
Recommend
More recommend