Network Telescope Data Analysis: IBR Monitoring telescope/darknets/darkspace, 22 Mar 11 Nevil Brownlee IBR Monitoring CAIDA, 2011 – p.1/17
Background, Earlier Work Moore, Shannon et al, CAIDA/UCSD, 2000..2006 telescope gives a whole-world (1/250) view, mathematical model of that used it to track the rise of fast-spreading worms Pang, Yegneswaran, Barford, Paxson and Peterson Characteristics of Internet Background Radiation, SIGCOMM 2004 passive analysis showing activity over time for and different ports & telescopes (darkspaces) active responders to investigate what sources were trying to do, e.g. Code Red, Agobot, Welchia, etc Wustrow, Karir, Bailey, Jahanian and Huston Internet Background Radiation Revisited, IMC, 2010 evolution of IBR since 2004 steady increase in Mb/s each year address pollution (looking at newly-allocated /8 prefixes; traffic to /8 prefixes within 1.0.0.0/8 IBR Monitoring CAIDA, 2011 – p.2/17
Telescope Monitoring – what do we want? A set of web pages we can look at each day that tells us “something interesting is happening” Would like to classify the unsolicited traffic sources into groups somehow, so that we could look for changes in levels of each groups new groups appearing old groups disappearing This is the same problem as that of managing a network Network Managers want a display that shows them “what’s happening in the network now” and the ability to ’drill down’ (by clicking on the display) to find more detail IBR Monitoring CAIDA, 2011 – p.3/17
Constraints, Approaches Problems: data volume: UCSD telescope trace files are big, about 4 ∼ 10 GiB every hour we we only do passive monitoring we need to do the monitoring in near-real time so as to see changes as they appear we’d like to save ‘interesting’ trace files for later fine-detail analysis Many opinions about what’s ‘interesting!’ for long-term monitoring (per-hour plots) we need to decide what we want to plot e.g. (simple example) TCP/UDP/ICMP source/packet/byte volumes Two approaches (1) Use fixed groups (2) Automated grouping (clustering) IBR Monitoring CAIDA, 2011 – p.4/17
Approach 1: Pre-determined Groups Nevil’s work Mar 2010 – Feb 2011 Define taxonomy of ‘interesting’ source groups TCP: port probe, vertical & horizontal scans, other UDP: port probe, vertical & horizontal scans, other Backscatter: (TCP ACK+SYN & TCP ACK+RST, ICMP TTL exceeded & destination unreachable) Others: Conficker C, ICMP only, . . . Analysis methodology build table of sources, count number of TCP/UDP/ICMP/other packets, and ports used by TCP/UDP at end of trace, use those counts to classify sources into above groups. Write summary file for the trace summary has counts & distributions of various packet metrics for each group, e.g. source lifetime, number of packets sent by source, . . . IBR Monitoring CAIDA, 2011 – p.5/17
Approach 1, example plots (a) probe ports UDP TCP kS/h kS/h 400 500 450 350 400 300 350 250 300 200 250 200 150 150 10000 10000 100 100 50 50 1000 1000 0 0 03 Apr 04 Apr 05 Apr 06 Apr 07 Apr 08 Apr 09 Apr 10 Apr TCP port number 03 Apr 04 Apr 05 Apr 06 Apr 07 Apr 08 Apr 09 Apr 10 Apr TCP port number 100 100 Time (UTC) Time (UTC) Number of probe sources sending to top 100 destination ports each hour for the week 03-10 Apr 2010 only two popular ports for TCP probe sources UDP probe sources used a wide range of ephemeral high-numbered ports IBR Monitoring CAIDA, 2011 – p.6/17
Approach 1, example plots (b) kS Counts: Thousands of Conficker P2P Sources, Jan - Apr 2010 (UTC) 140 120 Conficker C (p2p) sources 100 seen each hour, showing a 80 steady decrease from early 60 January 40 20 0 16 Jan 30 Jan 13 Feb 27 Feb 13 Mar 27 Mar 10 Apr % (b) Stacked-bar Time Series: Source % by Group, Jan - Apr 2010 (UTC) 100 80 Percentage of source 60 groups seen each hour: 40 – 30 Jan 1.8 MS/h – UDP sources growth 20 early Feb 0 16 Jan 30 Jan 13 Feb 27 Feb 13 Mar 27 Mar 10 Apr Conficker P2P UDP TCP Other TCP and UDP Unclassified IBR Monitoring CAIDA, 2011 – p.7/17
Approach 2: Automated grouping of traffic sources Classify into groups using a ‘volume’ metric (bytes/packets/flows) Split the groups into smaller groups using a ‘classifier’ metric Example analysis systems aguri: volume = byte/s classifier = source address / prefix length simple system, no GUI (produces lists of prefix hierarchy) nethadict: volume = bytes classifier = n-grams (p, n) p = byte position in pkt, n = value of byte(s) Automatically determines n-gram used to split a group, find some bytes common to 50% of group picks arbitrary n-grams IBR Monitoring CAIDA, 2011 – p.8/17
Clustering metrics Volume metrics: sources seen / s packets seen / s sources seen / s . . . Classifier metrics: source address / length (/ means ‘split on’) source port / port number → p% in group IP protocol (really only see TCP , UDP , ICMP) average packet length (not useful for TCP) packets/bytes (big ⇒ DOS attack, small ⇒ vulnerability probe) packet inter-arrival distribution (Nevil’s current project) IBR Monitoring CAIDA, 2011 – p.9/17
Comments on clustering When we observe a group, we don’t know what application is generating its packets to find that out we need to select out packets for sources in the group, and examine them so as to determine their protocol and (perhaps) generating application that’s hard to do automatically! Groups found by automatic classifiers are not stable. If we use clustering techniques to make groups 0 ..n n will vary over time a group with the same characteristics may change group numbers with each sample Such variability makes automatic grouping difficult to use for long-term trend monitoring IBR Monitoring CAIDA, 2011 – p.10/17
Approach 2: Clustering, using kmeans Nevil’s work 7-18 Mar 2011 Look at packet interarrival time (IAT) distributions for each source. Can we use IAT statistics to identify source applications? Collect IAT distributions (180 bins) for every source in an hour. Hour ending 1600 pm 8 March had 1.5 M sources. Find metrics we can use to represent an IAT distribution use log-scale bins, 0.012 to 600 s two metrics: median and skewness Tried clustering using using Using Dan Pelleg’s kmeans program (Dan is at the Auton lab, CMU). k-means clustering finds clusters in n − dimensional space, given that you know n. Dan has extended this idea so that the system determines how many clusters it can reliably find. This idea simply did not work well for IBR IATs IBR Monitoring CAIDA, 2011 – p.11/17
IATs using pre-determined groups Nevil’s work from 19 Mar 2011 (!) Simpler pragmatic approach: make ‘postage-stamp’ sheets showing individual IAT distributions find metrics for the distributions, print them on the sheets look for recurring patterns, i.e. source groups; find metric ranges that could be used to determine each distribution’s group print new postage-stamp sheets, one for each group iterate as more groups become apparent IAT metrics bin-zero %: > 95% → DOS source mode IAT: 2 . 5 .. 3 . 5 s → Windows XP’s TCP retry skewness: → left, right or evenly balanced maximum IAT: high values → ‘stealth’ probe sources IBR Monitoring CAIDA, 2011 – p.12/17
Group 0: DOS sources SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 7 SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 280 % 100 % 70 90 b0pc=99.85 mode=0.01 skew=-0.09 max=0.99 b0pc=33.33 mode=1.05 skew=0.00 max=1.05 60 80 50 70 60 40 50 30 40 30 20 20 10 10 0 0 0.01 0.03 0.1 0.3 1 3 10 30 100 300 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 168 SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 424 % % 35 40 b0pc=33.33 mode=500.49 skew=0.00 max=637.39 35 b0pc=40.00 mode=6.07 skew=40.00 max=179.11 30 30 25 25 20 20 15 15 10 10 5 5 0 0 0.01 0.03 0.1 0.3 1 3 10 30 100 300 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) packet inter-arrival time (s) IBR Monitoring CAIDA, 2011 – p.13/17
Group 1: XP_even sources SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 3 SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 140 % 18 % 20 18 16 b0pc=0.16 mode=3.12 skew=-7.44 max=60.34 b0pc=0.62 mode=2.94 skew=3.50 max=68.09 16 14 14 12 12 10 10 8 8 6 6 4 4 2 2 0 0 0.01 0.03 0.1 0.3 1 3 10 30 100 300 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 326 SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 188 % % 12 25 b0pc=0.52 mode=3.12 skew=-6.64 max=76.84 b0pc=4.23 mode=3.12 skew=14.08 max=242.32 10 20 8 15 6 10 4 5 2 0 0 0.01 0.03 0.1 0.3 1 3 10 30 100 300 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) packet inter-arrival time (s) IBR Monitoring CAIDA, 2011 – p.14/17
Recommend
More recommend