A Signal Analysis of Network Traffic Anomalies Paul Barford with Jeffery Kline, David Plonka, Amos Ron University of Wisconsin – Madison Fall, 2002
Overview • Motivation: Anomaly detection remains difficult • Objective : Improve understanding of traffic anomalies • Approach : Multiresolution analysis of data set that includes IP flow, SNMP and an anomaly catalog • Method : Integrated Measurement Analysis Platform for Internet Traffic (IMAPIT) • Results : Identify anomaly characteristics using wavelets and develop new method for exposing short-lived events pb@cs.wisc.ecdu 2
Our Data Sets • Consider anomalies in IP flow and SNMP data – Collected at UW border router (Juniper M10) – Archive of ~6 months worth of data (packets, bytes, flows) – Includes catalog of anomalies (after-the-fact analysis) • Group observed anomalies into four categories – Network anomalies (41) • Steep drop offs in service followed by quick return to normal behavior – Flash crowd anomalies (4) • Steep increase in service followed by slow return to normal behavior – Attack anomalies (46) • Steep increase in flows in one direction followed by quick return to normal behavior – Measurement anomalies (18) • Short-lived anomalies which are not network anomalies or attacks pb@cs.wisc.ecdu 3
pb@cs.wisc.ecdu 4
Multiresolution Analysis • Wavelets provide a means for describing time series data that considers both frequency and time – Powerful means for characterizing data with sharp spikes and discontinuities – Using wavelets can be quite tricky • We use tools developed at UW which together make up IMAPIT – FlowScan software – The IDR Framenet software pb@cs.wisc.ecdu 5
Our Wavelet System • After evaluating different candidates we selected a wavelet system called Pseudo Splines(4,1) Type 2. – A framelet system developed by Daubechies et al. ‘00 – Very good frequency localization properties • Three output signals are extracted – Low Frequency (L) : synthesis of all wavelet coefficients from level 9 and up – Mid Frequency (M) : synthesis of wavelet coefficients 6, 7, 8 – High Frequency (H) : synthesis of wavelet coefficients 1 to 5 pb@cs.wisc.ecdu 6
Ambient IP Flow Traffic One Autonomous System to Campus, Inbound, 2001-DEC-16 through 2001-DEC-23 20 M bytes, original signal Bytes/sec 15 M 10 M 5 M 0 6 M 4 M 2 M 0 -2 M bytes, high-band -4 M -6 M 4 M 2 M 0 -2 M bytes, mid-band -4 M 20 M bytes, low-band 15 M 10 M 5 M 0 Sat Sun Mon Tue Wed Thu Fri Sat Sun pb@cs.wisc.ecdu 7
Ambient SNMP Traffic One Interface to Campus, Inbound, 2001-DEC-16 through 2001-DEC-23 20 M bytes, original signal Bytes/sec 15 M 10 M 5 M 0 6 M 4 M 2 M 0 -2 M bytes, high-band -4 M -6 M 4 M 2 M 0 -2 M bytes, mid-band -4 M 20 M bytes, low-band 15 M 10 M 5 M 0 Sat Sun Mon Tue Wed Thu Fri Sat Sun pb@cs.wisc.ecdu 8
Byte Traffic for Flash Crowd Class-B Network, Outbound, 2001-SEP-30 through 2001-NOV-25 30 M Outbound Class-B Network Bytes, original signal 25 M Bytes/sec 20 M 15 M 10 M 5 M 0 Outbound Class-B Network Bytes, mid-band 5 M 0 -5 M -10 M 20 M Outbound Class-B Network Bytes, low-band 15 M 10 M 5 M 0 Oct-01 Oct-08 Oct-15 Oct-22 Oct-29 Nov-05 Nov-12 Nov-19 Nov-26 pb@cs.wisc.ecdu 9
Average Packet Size for Flash Crowd Campus HTTP, Outbound, 2001-SEP-30 through 2001-NOV-25 1500 1000 Bytes 500 outbound HTTP average packet size signal 0 300 200 100 0 -100 outound HTTP average packet size, mid-band -200 -300 1500 1000 500 outbound HTTP average packet size, low-band 0 Oct-01 Oct-08 Oct-15 Oct-22 Oct-29 Nov-05 Nov-12 Nov-19 Nov-26 pb@cs.wisc.ecdu 10
Flow Traffic During DoS Attacks Campus TCP, Inbound, 2002-FEB-03 through 2002-FEB-10 400 Flows/sec Inbound TCP Flows, original signal 300 200 100 0 200 Inbound TCP Flows, high-band 150 100 50 0 30 20 10 0 -10 Inbound TCP Flows, mid-band -20 -30 400 Inbound TCP Flows, low-band 300 200 100 0 Sun Mon Tue Wed Thu Fri Sat pb@cs.wisc.ecdu 11
Byte Traffic During Measurement Anomalies Campus TCP, Inbound, 2002-FEB-10 through 2002-FEB-17 30 M Inbound TCP Bytes, original signal Bytes/sec 20 M 10 M 0 8 M 6 M Inbound TCP Bytes, high-band 4 M 2 M 0 -2 M -4 M 6 M Inbound TCP Bytes, mid-band 4 M 2 M 0 -2 M 30 M Inbound TCP Bytes, low-band 20 M 10 M 0 Sun Mon Tue Wed Thu Fri Sat pb@cs.wisc.ecdu 12
Anomaly Detection via Deviation Score • Short-lived anomalies can be identified automatically based on variability in H and M signals 1. Compute local variability (using specified window) of H and M parts of signal 2. Combine local variability of H and M signals (using a weighted sum) and normalize by total variability to get deviation score V 3. Apply threshold to V then measure peaks • Analysis shows that V peaks over 2.0 indicate short- lived anomalies with high confidence – We threshold at V = 1.25 and set window size to 3 hours pb@cs.wisc.ecdu 13
Deviation Score for Three Anomalies Campus TCP, Inbound, 2002-FEB-03 through 2002-FEB-10 50 k 2 Inbound TCP Packets Deviation Score 40 k 30 k Packets/sec Score 20 k 1.5 10 k 0 Mon Mon Tue Tue Sun Sun Wed Wed Thu Thu Fri Fri Sat Sat pb@cs.wisc.ecdu 14
Deviation Score for Network Outage Inbound Outbound 30 M 2 Bytes/sec 20 M Score 10 M 1.5 0 40 k 2 30 k Pkts/sec Score 20 k 1.5 10 k 0 200 2 150 Flows/sec Score 100 1.5 50 0 Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat pb@cs.wisc.ecdu 15
Anomalies in Aggregate Signals Inbound Outbound 600 k 2 500 k Bytes/sec 400 k Score 300 k 200 k 1.5 100 k 0 20 k 2 15 k Pkts/sec Score 10 k 1.5 5 k 0 200 2 150 Flows/sec Score 100 1.5 50 0 Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Inbound Outbound 30 M 2 Bytes/sec 20 M Score 10 M 1.5 0 50 k 2 40 k Pkts/sec 30 k Score 20 k 1.5 10 k 0 300 2 250 Flows/sec 200 Score 150 100 1.5 50 0 Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat pb@cs.wisc.ecdu 16
Hidden Anomalies in Low Frequency Class-B Network, Outbound, 2001-NOV-25 through 2001-DEC-23 10 M 8 M Outbound Bytes, original signal Bytes/sec 6 M 4 M 2 M 0 3 M Outbound Bytes, high-band 2 M 1 M 0 -1 M -2 M 3 M Outbound Bytes, mid-band 2 M 1 M 0 -1 M -2 M -3 M 10 M 8 M Outbound Bytes, low-band 6 M 4 M 2 M 0 Nov-27 Dec-04 Dec-11 Dec-18 Dec-25 pb@cs.wisc.ecdu 17
Deviation Score Evaluation • How effective is deviation score at detecting anomalies? – Compare versus set of 39 anomalies • Set is unlikely to be complete so we don’t treat false-positives – Compare versus Holt-Winters Forecasting • Time series technique • Requires some configuration • Holt-Winters reported many more positives and sometimes oscillated between values Total Candidates Candidates Candidate detected by detected by Anomalies Deviation Holt-Winters Score 39 38 37 pb@cs.wisc.ecdu 18
Conclusion and Next Steps • We present an evaluation of signal characteristics of network traffic anomalies – Using IP flow and SNMP data collected at UW border router – IMAPIT developed to apply wavelet analysis to data – Deviation score developed to automate anomaly detection • Results – Characteristics of anomalies exposed using different filters and data – Deviation score appears promising as a detection method • Future – Development of anomaly classification methods – Application of results in (distributed) detection systems pb@cs.wisc.ecdu 19
Recommend
More recommend