Space-Code Bloom Filter for Efficient Traffic Flow Measurement Abhishek Kumar, Jun (Jim) Xu Networking and Telecommunications Group College of Computing Georgia Institute of Technology {akumar,jx}@cc.gatech.edu Li (Erran) Li Bell Laboratories erranlli@bell-labs.com Jia Wang AT&T Labs - Research jiawang@research.att.com Internet Measurement Conference 2003
Problem Statement Goal: To keep track of the total number of packets belonging to each flow at high speed links. Applications like traffic characterization, anomaly detection, per- flow QoS etc., need to know the size of all flows. Definition of Flow: All packets with the same flow-label. The flow-label can be defined as any combination of fields from the IP header, e.g <Source IP, source Port, Dest. IP, Dest. Port, Protocol>.
Why is per-flow measurement hard? • Majority of the packets belong to large flows, yet a majority of the flows are small. • High cost of maintaining per-flow data-structures. Amortiza- tion is difficult. • No clear definition of the “end” of a flow.
Related Approaches Sampling Sample packets with a fixed probability p and trace/process headers of sampled packets. This is the approach used by Cisco Netflow. • Flow-sizes can be inferred from sampled data. • Space-intensive. • Inaccurate for small flows. Keep track of elephants • Fast algorithm to filter packets from large flows. [Estan and Varghese, 2002] • Maintain counters for large flows only. • Success in tracking the largest few flows (e.g. carrying ≥ 1% of the total traffic) with limited memory.
Our Solution – Space-Code Bloom Filter (SCBF) • Tracks all flows – from mice to elephants. • Provides approximate estimate of flow-size. • The relative error in estimation is the same for all flow sizes. • The approx. estimates are close to the actual flow-size with high probability.
Operation of Space-Code Bloom Filter (SCBF) – Insertion Phase • Measurement proceeds in epochs (e.g. 10 second). • Maintain an aggregate synopsis data-structure. • Update the data-structure on every packet arrival. • Write-only data structure → fast updates, low hardware com- plexity. • Copies of the synopsis are paged to disk periodically. SCBF Module 0. New packet SRAM arrival 4. Query Module 1 Persistent Header CPU Storage 5.Answer SRAM 1. Process 2. Write 3. Paging Module 2 header to SCBF to disk once "full"
Operation of Space-Code Bloom Filter (SCBF) – Query Phase • Queries provide a flow-label and ask for its size. • Obtain a “count” from the data-structure and then lookup a precomputed table to return approximate size of the flow. • This provides approximate estimates that have low relative er- ror with high probability.
Design of the aggregate data-structure • Bloom-filters answer set-membership questions with high ac- curacy. • Space-Code Bloom Filter answers multiset-membership ques- tions with high accuracy. • Use a number of “virtual” Bloom-filters, thus spreading the multiplicity information over space . • Hash functions allow us to “isolate” flows from each other, thus spreading the multiplicity information over code . • A Space-Code Bloom filter represents a large number of sta- tistical estimators in parallel.
Performance of SCBF - complexities • Computational complexity – compute 5 hash-fuctions and write 5 bits per packet. • Space complexity – 4 bits of storage required for each packet. • Can operate at OC768 (40 Gbps) with 5 ns SRAM. • More than 80% responses are within ± 25% of the actual value.
Accuracy of SCBF 100000 100000 estimated original original estimated Estimated flow length (packets) 10000 10000 Flow length (packets) 1000 1000 100 100 10 10 1 1 1 10 100 1000 10000 100000 0.001 0.01 0.1 1 10 100 Original flow length (packets) Normalized rank of flows (a) Original vs. estimated flow size. Note that both (b) Distribution of the original and estimated flow axes are on logscale. size.
Conclusions • Space-Code Bloom Filters can track the approximate size of every flow. • Per-flow accounting without per-flow state. • The relative error in approximation is same for all flow-sizes. • Very fast (upto OC768) implementations possible due to “write- only” nature of updates. • Design parameters of SCBF can be tuned to trade storage space and CPU cycles for accuracy.
Acknowledgments We thank Oliver Spatschek for providing us with the traffic traces.
Questions ???
Accuracy of SCBF using Maximum Likelihood Estimation (MLE) 1.05 ε =1 1 ε =0.5 0.95 0.9 0.85 ε =0.25 1- δ 0.8 0.75 ε =0.2 0.7 0.65 0.6 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 F (c) Theoretical accuracy of MLE using 32 groups.
Accuracy of SCBF using Maximum Likelihood Estimation 1 0.9 <-- flows >=5 flows >=10 --> 0.8 flows >=3 ---> <-- all flows flows >=100 --> P[relative err < e ] (CDF) 0.7 0.6 0.5 all flows flows >=2 flows >=3 0.4 flows >=5 flows >=10 0.3 flows >=100 0.2 0.1 0 -1 -0.5 0 0.5 1 e (d) CDF of relative error for flows of various size
Recommend
More recommend