an empirical evaluation of entropy based traffic anomaly
play

An Empirical Evaluation of Entropy- based Traffic Anomaly Detection - PowerPoint PPT Presentation

An Empirical Evaluation of Entropy- based Traffic Anomaly Detection George Nychis, Vyas Sekar, David Andersen, Hyong Kim, Hui Zhang Carnegie Mellon University Entropy-based Anomaly Detection Goal: detect abnormal behavior scan activity,


  1. An Empirical Evaluation of Entropy- based Traffic Anomaly Detection George Nychis, Vyas Sekar, David Andersen, Hyong Kim, Hui Zhang Carnegie Mellon University

  2. Entropy-based Anomaly Detection  Goal: detect abnormal behavior  scan activity, DDoS, bandwidth floods ...  Traditional: raw traffic volume ( insufficient)  e.g., total number of packets in an epoch  Modern : entropy-based traffic metrics  e.g., relative randomness in distribution of packets across ports Example Anomaly Entropy: Detectable Traffic Volume: Undetected 2

  3. Motivation Anomaly Detection Traffic Feature Timeseries NetFlow Alarm! Detection Data 3

  4. Motivation Anomaly Detection Traffic Feature Timeseries NetFlow sum(packets) A(pkts) Detection Data 3

  5. Motivation Anomaly Detection Traffic Feature Timeseries NetFlow H(addresses) A(addr) Detection Data Entropy-based Features: Dist. of packets across addresses 3

  6. Motivation Anomaly Detection Traffic Feature A(addr) Timeseries NetFlow A(port) H(ports) Detection Data Entropy-based Features: Distribution of packets across ports H(addresses) 3

  7. Motivation Anomaly Detection A(addr) Traffic Feature A(port) Timeseries NetFlow H(flow-size) A(FSD) Detection Data Entropy-based Features: Distribution of flow-sizes (in packets) H(addresses) H(ports) 3

  8. Motivation Anomaly Detection A(addr) A(port) Traffic Feature A(FSD) Timeseries NetFlow H(degree) A(deg) Detection Data Entropy-based Features: Distribution of host communication H(addresses) H(ports) H(flow-size) 3

  9. Motivation Anomaly Detection A(addr) A(port) Traffic Feature A(FSD) Timeseries NetFlow ???????? A(deg) Detection Data Entropy-based Features: H(addresses) H(ports) H(flow-size) H(degree) 3

  10. Motivation Anomaly Detection A(addr) A(port) Traffic Feature A(FSD) Timeseries NetFlow ???????? A(deg) Detection Data Entropy-based Features: H(addresses) H(ports) H(flow-size) H(degree)  Goal: understanding the features 3

  11. Motivation Anomaly Detection A(addr) A(port) Traffic Feature A(FSD) Timeseries NetFlow ???????? A(deg) Detection Data Entropy-based Features: H(addresses) H(ports) H(flow-size) H(degree)  Goal: understanding the features 1. How unique are their detection capabilities? 2. How effective are they? 3

  12. Analysis Method 5 one-month-long traces: NetFlow CMU-2005, CMU-2008, GATech-2008, Data GEANT-2005, Internet2-2006 4

  13. Analysis Method 5 one-month-long traces: NetFlow CMU-2005, CMU-2008, GATech-2008, Data GEANT-2005, Internet2-2006 H(addresses) H(ports) Entropy Timeseries H(flow-size) H(degree) 4

  14. Analysis Method 5 one-month-long traces: NetFlow CMU-2005, CMU-2008, GATech-2008, Data GEANT-2005, Internet2-2006 H(addresses) H(ports) Entropy Timeseries H(flow-size) H(degree) Are the distributions structurally similar? Timeseries Correlation 4

  15. Analysis Method 5 one-month-long traces: NetFlow CMU-2005, CMU-2008, GATech-2008, Data GEANT-2005, Internet2-2006 H(addresses) H(ports) Entropy Timeseries H(flow-size) H(degree) Are the A(addr) distributions A(port) structurally Anomaly Detection A(FSD) similar? A(deg) Timeseries Correlation 4

  16. Analysis Method 5 one-month-long traces: NetFlow CMU-2005, CMU-2008, GATech-2008, Data GEANT-2005, Internet2-2006 H(addresses) H(ports) Entropy Timeseries H(flow-size) H(degree) Are the A(addr) distributions A(port) structurally Anomaly Detection A(FSD) similar? A(deg) Anomaly Correlation Timeseries Correlation Goal(1): Uniqueness 4

  17. Analysis Method 5 one-month-long traces: NetFlow CMU-2005, CMU-2008, GATech-2008, Data GEANT-2005, Internet2-2006 H(addresses) H(ports) Entropy Timeseries H(flow-size) H(degree) Are the A(addr) distributions A(port) structurally Anomaly Detection A(FSD) similar? A(deg) Anomaly Correlation Timeseries Correlation Goal(1): Uniqueness 4

  18. Entropy Timeseries (February 2005) In-degree Out-degree Flow-size Src. Address Dst. Address Src. Port Dst. Port Raw traffic volume 5

  19. Entropy Timeseries (February 2005) In-degree Out-degree Flow-size Src. Address Dst. Address Src. Port Dst. Port Raw traffic volume 5

  20. Entropy Timeseries (February 2005) In-degree test  Out-degree Flow-size Src. Address Dst. Address Src. Port Dst. Port Raw traffic volume 5

  21. Entropy Timeseries (February 2005) In-degree test  Out-degree Flow-size Src. Address Dst. Address Src. Port Dst. Port Raw traffic volume 5

  22. Entropy Timeseries (February 2005) In-degree test  Out-degree Flow-size Src. Address Dst. Address Src. Port Dst. Port Raw traffic volume 5

  23. Analysis Method 5 one-month-long traces: NetFlow CMU-2005, CMU-2008, GATech-2008, Data GEANT-2005, Internet2-2006 H(addresses) H(ports) Entropy Timeseries H(flow-size) H(degree) Are the A(addr) distributions A(port) structurally Anomaly Detection A(FSD) similar? A(deg) Anomaly Correlation Timeseries Correlation Goal(1): Uniqueness 6

  24. Correlation in Entropy Timeseries  Pairwise correlation-scores for CMU-2005  All 4 other traces exhibit similar behavior! 7

  25. Why Entropy is Structurally Correlated 1. Port / Address Correlation  Properties of Network Traffic: - contribute X packets to address A - contribute X packets to port B … if hosts have few connections, and ports are uniformly random → similar distributions 8

  26. Why Entropy is Structurally Correlated 1. Port / Address Correlation  Properties of Network Traffic 2. Source / Destination Correlation  Flow accounting: - Bi-directional: Addr1(23) → Addr2(53) Bi-directional Saddr(23) Daddr(53) 8

  27. Why Entropy is Structurally Correlated 1. Port / Address Correlation  Properties of Network Traffic 2. Source / Destination Correlation  Flow accounting: - Uni-directional: Addr1 → Addr2 (23) Addr2 → Addr1 (53) Bi-directional Uni-directional Saddr(23) Saddr(23), Daddr(23) Daddr(53) Saddr(53), Daddr(53) Uni-directionality destroys 2 unique distributions 8

  28. Why Anomalies are Correlated  Root-cause analysis approach: no Remove Recompute Anomaly Analyze top-k flows entropy subsides? yes, cause!  Our results:  Ports & addresses: only detect alpha flows (correlation)  FSD: detects scans, Degree: SYN flood  FSD & Degree are unique ( no correlation ) 9

  29. Why Anomalies are Correlated  Root-cause analysis approach: no Remove Recompute Anomaly Analyze top-k flows entropy subsides? yes, cause! Traffic volume  Our results:  Ports & addresses: only detect alpha flows (correlation)  FSD: detects scans, Degree: SYN flood  FSD & Degree are unique ( no correlation ) 9

  30. Summary of Goal(1): Uniqueness  Strong correlation in ports and addresses  Flow-size and degree: unique  Structural correlation : properties of traffic  Anomaly correlation : types of anomalies seen 10

  31. Understanding Effectiveness Inject Synthetic Anomalies NetFlow Data Entropy Timeseries Anomaly Detection Anomaly Correlation Timeseries Correlation 11

  32. Best Distribution for an Anomaly?  Anomalies: BW Flood, Scanner, Multiple Scanners, Port Scan, and SYN Flood  Other Results:  BW Flood :  ports & addresses  already detectable FSD best by traffic volume detector  Scans:  difficult to detect  … FSD and degree 12

  33. Implications and Conclusions  Look beyond ports and addresses  Select complementary traffic distributions  Uni-directional accounting introduces biases in traffic distributions  Future Work: Can correlations be leveraged?  during anomalies found in flow-size & degree, correlation drops between ports & addresses 13

  34. Questions? 14

Recommend


More recommend