Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation Discovering Packet Structure through Lightweight Hierarchical Clustering Abdulrahman Hijazi Hajime Inoue Ashraf Matrawy P .C. van Oorschot Anil Somayaji 1 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation The Problem Network traffic is complex Many users and uses Numerous applications and protocols Massive operating systems and connected devices New applications/protocols appear quickly Tools are limited and require a priori knowledge 2 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation Example Surge in port 80 traffic. Why? Flash crowd Web service config error Worm P2P 3 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation Two common approaches What tools are available to understand traffic? Header-based classifiers (IP addresses and port 1 numbers): fail with misleading and ambiguous protocols Protocol dissectors (e.g. Wireshark): fail with unknown 2 protocols and are knowledge intensive 4 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation Our target Devise a technique to complement existing tools in detecting/understanding novel traffic patterns. This technique: works at wire/router speeds doesn’t require signatures or built-in knowledge groups network traffic into semantically equivalent clusters automatically adapts to the ever changing network traffic 5 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation Approach Unsupervised clustering algorithm that creates semantically equivalent classes without manually labeled training data. The algorithm tries to find patterns (within the whole packet) across protocols, and use them to cluster the network traffic. Packets are clustered, rather than classified, in order to capture the commonalities of novel, unknown network protocols and usage patterns. 6 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation ( p , n ) − grams We define ( p , n ) − grams: n − gram: n consecutive bytes within a packet ( p , n ) − gram: n − gram at position p In our experiments we found that network packets generally contain a significant number of high and moderate frequency ( p , n ) − grams that appear to follow a power-law analogous to Zipf’s law. 7 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation ( p , n ) − gram Frequency Distribution Most frequent 1000 ( p , n ) − grams in a 3-hour-long dataset 8 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation ( p , n ) − gram Frequency Distribution Most frequent 1000 ( p , n ) − grams in a 3-hour-long dataset 9 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation ( p , n ) − gram Frequency Distribution Most frequent 1000 ( p , n ) − grams in a 3-hour-long dataset 10 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation ( p , n ) − gram Frequency Distribution Most frequent 1000 ( p , n ) − grams in a 3-hour-long dataset 11 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation ( p , n ) − gram Frequency Distribution Most frequent 1000 ( p , n ) − grams in a 3-hour-long dataset 12 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation ADHIC ADHIC (Approximate Divisive HIerarchical Clustering): produces a hierarchical decomposition of network traffic in the form of a cluster-identifying decision tree ADHIC starts with one cluster and then developes and shapes the decision tree over the time using splitting and deletion. We supplement ADHIC with port-based classifier in order to label the leaves in the leaf nodes. 13 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation ADHIC Decision Tree How ADHIC Trees Start and Develop N1 10565 (100.00%) 80250 (100.00%) 22 ADHIC Tree starts with one node accepting all traffic 14 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation ADHIC Decision Tree How ADHIC Trees Start and Develop N1 11109 (100.00%) 148757 (100.00%) 21 ADHIC Tree starts with one node accepting all traffic 15 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation ADHIC Decision Tree How ADHIC Trees Start and Develop N2 43, 0x00 0x00 11100 (100.00%) 11100 (5.89%) N3 N4 5228 (47.10%) 5872 (52.90%) 5228 (2.77%) 5872 (3.12%) 7 22 ADHIC choses a ( p , n ) -gram matching 40%-60% and splits <== matching | non-matching ==> 16 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation ADHIC Decision Tree How ADHIC Trees Start and Develop N2 43, 0x00 0x00 8713 (100.00%) 19813 (10.67%) N3 N4 4013 (46.06%) 4700 (53.94%) 9241 (4.98%) 10572 (5.69%) 8 22 ADHIC choses a ( p , n ) -gram matching 40%-60% and splits <== matching | non-matching ==> 17 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation ADHIC Decision Tree How ADHIC Trees Start and Develop N2 43, 0x00 0x00 7053 (100.00%) 153214 (100.00%) N5 N8 51, 0x00 0x00 31, 0x75 0x15 2724 (38.62%) 4329 (61.38%) 30980 (20.22%) 55875 (36.47%) N6 N7 N9 N10 1581 (22.42%) 1143 (16.21%) 1365 (19.35%) 2964 (42.02%) 14147 (9.23%) 16833 (10.99%) 32485 (21.20%) 23390 (15.27%) 4 6 9 21 Further splitting... 18 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation ADHIC Decision Tree How ADHIC Trees Start and Develop N2 43, 0x00 0x00 8820 (100.00%) 155076 (100.00%) N5 N8 51, 0x00 0x00 31, 0x75 0x15 4093 (46.41%) 4727 (53.59%) 44132 (28.46%) 71887 (46.36%) N6 N7 N9 N10 2616 (29.66%) 1477 (16.75%) 1851 (20.99%) 2876 (32.61%) 22025 (14.20%) 22107 (14.26%) 40391 (26.05%) 31496 (20.31%) 6 8 7 20 Further splitting... 19 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation ADHIC ADHIC (Approximate Divisive HIerarchical Clustering) Recursively subdivides traffic into binary classes until: Volume is below threshold Group is too similar or too dissimilar Produces binary decision-tree Internal nodes match against ( p , n ) -grams Leaf nodes constitute terminal clusters Path from root to leaf constitutes boolean-expression 20 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation Example Tree N2 43, 0x00 0x00 N8 N5 51, 0x00 0x00 31, 0x75 0x15 N20 37, 0xc1 0x00 N44 9, 0x70 0xad N14 N17 6, 0x00 0x01 N116 16, 0x00 0x28 7, 0xd0 0xd3 N158 46, 0x50 0x18 N140 N41 N11 61, 0x00 0x0c N62 ARP HSRP N35 9, 0x70 0xad N32 TCP N299 N170 16, 0x00 0x30 N653 N173 (control) 16, 0x05 0x8c 16, 0x05 0x8c 50, 0x00 0x00 POP N29 N101 N141 N98 STP 16, 0x00 0x30 8, 0xd3 0x3b N80 HTTP + TCP N56 N545 N443 22, 0x01 0x11 TCP (control) 22, 0x2c 0x06 N227 54, 0x01 0x01 N203 N497 (control) 56, 0x00 0x00 N218 N458 55, 0x53 0x63 ARP 64, 0x00 0x0f HTTP N53 N221 N254 TCP N548 25, 0x29 0x86 N179 HTTP + TCP (control) CUPS 27, 0x75 0x1b (control) N416 N308 N245 N81 N546 N608 30, 0xff 0xff 54, 0x01 0x01 N204 0, 0x00 0x03 Ganglia IMAPS HTTP N228 CUPS N338 N219 NBSS + TCP 0, 0x00 0x03 EIGRP (control) N527 N335 N566 N222 174, 0x00 0x00 82, 0x00 0x00 EIGRP 46, 0x80 0x10 N379 N452 N569 N246 IMAPS + TCP IMAPS + TCP TCP TCP N417 (control) (control) (control) Mix. (control) N339 N412 TCP IGMP N683 N686 N409 (control) N336 N528 TCP TCP IPP + TCP DTP (control) (control) (control) NBDGM 21 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation Case Study: Exploring P2P Ports not needed to classify traffic New traffic segregated, then Goes back to original tree 22 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation Case Study: Exploring P2P A decision tree just BEFORE P2P traffic 23 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation Case Study: Exploring P2P The decision tree WITH P2P traffic dark blue = P2P UDP tracking packets green and yellow = P2P TCP Data packets running on port 80 24 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation Case Study: Exploring P2P The decision tree WITH P2P traffic dark blue = P2P UDP tracking packets green and yellow = P2P TCP Data packets running on port 80 25 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation Case Study: Exploring P2P The decision tree WITH P2P traffic dark blue = P2P UDP tracking packets green and yellow = P2P TCP Data packets running on port 80 26 / 29
Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation Case Study: Exploring P2P The decision tree just AFTER P2P traffic 27 / 29
Recommend
More recommend