george karabatis 1 jianwu wang 1 ahmed aleroud 2
play

George Karabatis 1 , Jianwu Wang 1 , Ahmed AlEroud 2 {georgek, - PowerPoint PPT Presentation

Towards Adaptive Big Data Cyber-attack Detection via Semantic Link Networks George Karabatis 1 , Jianwu Wang 1 , Ahmed AlEroud 2 {georgek, jianwu, ahmed21}@umbc.edu 1 Department of Information Systems University of Maryland, Baltimore County 2


  1. Towards Adaptive Big Data Cyber-attack Detection via Semantic Link Networks George Karabatis 1 , Jianwu Wang 1 , Ahmed AlEroud 2 {georgek, jianwu, ahmed21}@umbc.edu 1 Department of Information Systems University of Maryland, Baltimore County 2 Department of Computer Information Systems Yarmouk University, Irbid, Jordan Mission Critical Big Data Analytics MCBDA – Prairie View, TX, May 2016 1 UMBC

  2. Why are Cyber Attacks an issue? Sabotage of Operations Data Security (Database & Communication) Communication Interference Financial fraud Grid Security 2 UMBC

  3. Intrusion Detection Systems • Packet-based IDSs : Analyze the content of network packets to predict attacks – Fairly hard task with today’s high speed Gigabit networks which carry vast volumes of network packets • Flow-based IDSs : Detect Cyber attacks by analyzing net-flows – The content of packets is not-available – Only traffic-based features 3 UMBC

  4. Packet-based Intrusion Detection Advantages • Have full access to payload • More information is available • More accurate intrusion detection Disadvantages • Increasing network bandwidth generates huge amounts of data • Analysis of data is computationally expensive Result : Perfect big data problem 4 UMBC

  5. Network Flows (flows) Think of it like phone call metadata: who called whom, when, but without the conversation • Source/Destination IP • Input/Output Router Interface • Protocol • Type of Service Packet Count • • Octet Count • Start/End Time • TCP Flags • Source/Dest Network Mask Input/Output Interface encapsulation size • • IP Address of next hop within the peer • Router IP of cache shortcut in supervisor UMBC

  6. NetFlow flow 4 flow 1 flow 2 flow 3 • Set of packets that “belong together” – Source/destination IP addresses and port numbers – Same protocol, … – Same input/output interfaces at a router (if known) • Packets that are “close” together in time – Maximum spacing between packets (e.g., 30 sec) UMBC

  7. Flow-based intrusion detection Advantages • Less information is available • Detection process is faster due to less data Disadvantages • Have no access to payload • Subset of attacks detected • Accuracy not as good as packet-based 7 UMBC

  8. Semantic Link Networks (SLN) • A SNL is a graph with nodes and edges • Nodes: Represent alerts or benign activity • Edges: Weighted links representing similarity of the nodes – Measured in terms of context: time, location, numerical, and descriptive features 8 UMBC

  9. Contextual features • Time – Start, end time of flows • Location – Source, destination IP addresses, port numbers • Numerical – Traffic statistics, e.g. # of packets, octets • Descriptive – Other characteristics, e.g. flags, protocol 9 UMBC

  10. Constructing SLNs 𝑞 𝑜 1 , 𝑜 2 𝑞 𝑜 1 , 𝑜 2 = 𝑇𝑇𝑇 𝑜 1 , 𝑜 2 n 1 𝑞 n 2 ∑ 𝑇𝑇𝑇 𝑛 𝑛=1 Nodes represent either alerts or benign activities – Each node is initially represented as feature vector Binary Feature Vectors (e.g. TCP flags) Feature Vectors using numerical weights 𝐵𝐵 _ 𝑇𝑇𝑇 𝑜 1 , 𝑜 2 𝑄𝑄 _ 𝑇𝑇𝑇 𝑜 1 , 𝑜 2 n 1 n 2 n 1 n 2 f1 0.7 0.8 f1 1 1 f2 0.02 0.5 f2 0 1 f3 0.9 0.03 f3 1 0 f4 0.01 0.01 f4 0 0 Edges: weighted links (calculated using Anderberg and Pearson) 10 UMBC

  11. Intrusion Detection with SLNs After SLN is complete, and during run-time • Investigate features of an incoming flow • Find start node in the SLN with similar features to the incoming flow – Classifies individual flows using rule-based classifier that works on flow features (J48) • Expand the set of nodes with additional ones based on: – Connectivity on the graph – Threshold value (controls scope of expansion) • Recall is increased, but may have false positives 11 UMBC

  12. Intrusion Detection with SLNs • Apply context filters – Limit the expanded result set – Reduce the false positives/negatives • Precision increases • SLN must be updated when new attacks (nodes) are discovered – Graph re-generation is expensive – Dynamic approach is more promising 12 UMBC

  13. Attack Prediction Process Classification rules for Initial prediction Incoming flow R1 R2 Rn Filtering FPs Final predictions 13 UMBC

  14. Hybrid intrusion detection • Combines flow-based and packet-based • Takes advantages of both approaches • Requires big data platform • Increased accuracy of predictions (obviously) 14 UMBC

  15. Hybrid intrusion detection Layer one • Flow-based approach is applied • If prediction is benign, allow flow to pass • If prediction is suspicious analyze further – Flow marked suspicious with high probability, then enforce appropriate policy: • Deny entry • Divert to another system (e.g. honeypot) – Flow marked suspicious with medium probability, then proceed to layer two 15 UMBC

  16. Hybrid intrusion detection Layer two • More information is needed to decide • Corresponding packets are passed to Spark based platform • Spark Dstream is applied • Map function in parallel for both individual and multi-stage packet analysis 16 UMBC

  17. Hybrid Big-Data IDS Flow-based layer 17 UMBC

  18. Hybrid intrusion detection • Multistage attacks – Requires current and past (historical packets with same IP address) – A NoSQL DB (Cassandra) stores suspicious packets and is queried for matched patterns – Newly discovered attacks are used to dynamically update the SLN 18 UMBC

  19. Advantages of Hybrid Approach • Flows that are predicted as benign or suspicious with high probability do not reach the second layer (packet examination) saving computational resources • Only questionable flows are further examined at the packet level • Accuracy of the prediction is expected to rise, since more information (payload) is available • More attacks may be recognized (since there is access to payload, in addition to flow data) • Compared to packet-based approaches, our approach requires less computational resources 19 UMBC

  20. Packet Analysis on Spark • Create a Spark streaming context with batch interval at n second • Create DStream by collecting incoming network socket, a DStream contains all packets within the batch interval time window • Apply full packet analysis function for each packet in parallel through the DStream’s map function, output each suspicious packet and its attackType using key-value structure • Report new types of attacks to update SLN • Apply multistage packet analysis function for each DStream element in parallel through DStream’s map function, output each suspicious multistage packets and its attackType using key-value structure 20 UMBC

  21. Conclusions • A promising technique for huge amounts of network data • Takes advantage of flow and packet approaches • Builds on previous success on packet-based and flow-based intrusion detection • Work in progress on Hybrid approach for BD – Implementation for Spark platform – Evaluation with datasets 21 UMBC

  22. Questions? 22 UMBC

Recommend


More recommend