Introduction Prediction Method Load Shedding Evaluation Conclusions Load Shedding in Network Monitoring Applications . Barlet-Ros 1 G. Iannaccone 2 J. Sanjuàs-Cuxart 1 P D. Amores-López 1 J. Solé-Pareta 1 1 Technical University of Catalonia (UPC) Barcelona, Spain {pbarlet, jsanjuas, damores, pareta}@ac.upc.edu 2 Intel Research Berkeley, CA gianluca.iannaccone@intel.com USENIX Annual Technical Conference, 2007 1 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Outline Introduction 1 Motivation Case Study: Intel CoMo Prediction Method 2 Work Hypothesis Multiple Linear Regression Load Shedding 3 When, Where and How Much Evaluation and Operational Results 4 Performance Results Accuracy Results Conclusions and Future Work 5 2 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Outline Introduction 1 Motivation Case Study: Intel CoMo Prediction Method 2 Work Hypothesis Multiple Linear Regression Load Shedding 3 When, Where and How Much Evaluation and Operational Results 4 Performance Results Accuracy Results Conclusions and Future Work 5 2 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Motivation Building robust network monitoring applications is hard Unpredictable nature of network traffic Anomalous traffic, extreme data mixes, highly variable data rates Processing requirements have greatly increased in recent years E.g., intrusion and anomaly detection 3 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Motivation Building robust network monitoring applications is hard Unpredictable nature of network traffic Anomalous traffic, extreme data mixes, highly variable data rates Processing requirements have greatly increased in recent years E.g., intrusion and anomaly detection The problem Efficiently handling extreme overload situations Over-provisioning is not possible 3 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Case Study: Intel CoMo CoMo (Continuous Monitoring) 1 Open-source passive monitoring system Fast implementation and deployment of monitoring applications Traffic queries are defined as plug-in modules written in C Contain complex computations Stateless filter and measurement interval 1 http://como.sourceforge.net 4 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Case Study: Intel CoMo CoMo (Continuous Monitoring) 1 Open-source passive monitoring system Fast implementation and deployment of monitoring applications Traffic queries are defined as plug-in modules written in C Contain complex computations Stateless filter and measurement interval Traffic queries are black boxes Arbitrary computations and data structures Load shedding cannot use knowledge about the queries 1 http://como.sourceforge.net 4 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Load Shedding Approach Working Scenario Monitoring system supporting multiple arbitrary queries Single resource: CPU cycles Approach: Real-time modeling of the queries’ CPU usage Find correlation between traffic features and CPU usage 1 Features are query agnostic with deterministic worst case cost Exploit the correlation to predict CPU load 2 Use the prediction to guide the load shedding procedure 3 5 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Load Shedding Approach Working Scenario Monitoring system supporting multiple arbitrary queries Single resource: CPU cycles Approach: Real-time modeling of the queries’ CPU usage Find correlation between traffic features and CPU usage 1 Features are query agnostic with deterministic worst case cost Exploit the correlation to predict CPU load 2 Use the prediction to guide the load shedding procedure 3 Novelty: No a priori knowledge of the queries is needed Preserves high degree of flexibility Increases possible applications and network scenarios 5 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions System Overview Figure: Prediction and Load Shedding Subsystem 6 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Outline Introduction 1 Motivation Case Study: Intel CoMo Prediction Method 2 Work Hypothesis Multiple Linear Regression Load Shedding 3 When, Where and How Much Evaluation and Operational Results 4 Performance Results Accuracy Results Conclusions and Future Work 5 6 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Work Hypothesis Our thesis Cost of mantaining data structures needed to execute a query can be modeled looking at a set of traffic features Empirical observation Different overhead when performing basic operations on the state while processing incoming traffic E.g., creating or updating entries, looking for a valid match, etc. Cost of a query is mostly dominated by the overhead of some of these operations 7 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Work Hypothesis Our thesis Cost of mantaining data structures needed to execute a query can be modeled looking at a set of traffic features Empirical observation Different overhead when performing basic operations on the state while processing incoming traffic E.g., creating or updating entries, looking for a valid match, etc. Cost of a query is mostly dominated by the overhead of some of these operations Our method Models queries’ cost by considering the right set of traffic features 7 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Traffic Features vs CPU Usage 6 x 10 CPU cycles 4 2 0 0 10 20 30 40 50 60 70 80 90 100 3000 Packets 2000 1000 0 0 10 20 30 40 50 60 70 80 90 100 5 x 10 15 Bytes 10 5 0 0 10 20 30 40 50 60 70 80 90 100 3000 5−tuple flows 2000 1000 0 0 10 20 30 40 50 60 70 80 90 100 Time (s) Figure: CPU usage compared to the number of packets, bytes and flows 8 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Traffic Features vs CPU Usage 6 x 10 2.8 2.6 2.4 CPU cycles 2.2 2 1.8 new_5tuple_flows < 500 500 ≤ new_5tuple_flows < 700 1.6 700 ≤ new_5tuple_flows < 1000 new_5tuple_flows ≥ 1000 1.4 1800 2000 2200 2400 2600 2800 3000 packets/batch Figure: CPU usage versus the number of packets and flows 9 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Multiple Linear Regression (MLR) Linear Regression Model Y i = β 0 + β 1 X 1 i + β 2 X 2 i + · · · + β p X pi + ε i , i = 1 , 2 , . . . , n . Y i = n observations of the response variable (measured cycles) X ji = n observations of the p predictors (traffic features) β j = p regression coefficients (unknown parameters to estimate) ε i = n residuals (OLS minimizes SSE) Feature Selection Variant of the Fast Correlation-Based Filter 2 (FCBF) Removes irrelevant and redundant predictors Reduces significantly the cost of the MLR 2 L. Yu and H. Liu. Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. In Proc. of ICML , 2003. 10 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions System Overview Prediction and Load Shedding subsystem Each 100 ms of traffic is grouped into a batch of packets 1 The traffic features are efficiently extracted from the batch (multi-resolution bitmaps) 2 3 The most relevant features are selected (using FCBF) to be used by the MLR 4 MLR predicts the CPU cycles required by the query to run 5 Load shedding is performed to discard a portion of the batch 6 CPU usage is measured (using TSC) and fed back to the prediction system 11 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Outline Introduction 1 Motivation Case Study: Intel CoMo Prediction Method 2 Work Hypothesis Multiple Linear Regression Load Shedding 3 When, Where and How Much Evaluation and Operational Results 4 Performance Results Accuracy Results Conclusions and Future Work 5 11 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Load Shedding When to shed load When the prediction exceeds the available cycles avail _ cycles = ( 0 . 1 × CPU frequency ) − overhead Corrected according to prediction error and buffer space Overhead is measured using the time-stamp counter (TSC) How and where to shed load Packet and Flow sampling (hash based) The same sampling rate is applied to all queries How much load to shed Maximum sampling rate that keeps CPU usage < avail _ cycles srate = avail _ cycles pred _ cycles 12 / 18
Introduction Prediction Method Load Shedding Evaluation Conclusions Load Shedding When to shed load When the prediction exceeds the available cycles avail _ cycles = ( 0 . 1 × CPU frequency ) − overhead Corrected according to prediction error and buffer space Overhead is measured using the time-stamp counter (TSC) How and where to shed load Packet and Flow sampling (hash based) The same sampling rate is applied to all queries How much load to shed Maximum sampling rate that keeps CPU usage < avail _ cycles srate = avail _ cycles pred _ cycles 12 / 18
Recommend
More recommend