pattern detection in computer networks using robust
play

Pattern Detection in Computer Networks Using Robust Principal - PowerPoint PPT Presentation

Pattern Detection in Computer Networks Using Robust Principal Component Analysis Randy Paffenroth Associate Professor of Mathematical Sciences, and Associate Professor of Computer Science Data Science Program Worcester Polytechnic Institute


  1. Pattern Detection in Computer Networks Using Robust Principal Component Analysis Randy Paffenroth Associate Professor of Mathematical Sciences, and Associate Professor of Computer Science Data Science Program Worcester Polytechnic Institute CS525 URBAN NETWORKS: METHODS AND ANALYSIS 4-19-2017

  2. Urban networks vs. Computer networks By Howchou (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia By Hibernia Networks (Hibernia Networks) [Public Commons domain], via Wikimedia Commons

  3. What am I going to talk about? ● Robust Principal Component Analysis as applied to analysis of computer networks. ● In effect, I am interested in “semi-supervised” learning where much of the data is unlabeled and has to “speak for itself” ● Attempt to justify why I think this is an interesting way to think about network analysis. ● Show some examples in this area. ● Beware! I am a mathematician and, morally, I can’t give a talk without any equations :-)

  4. Theory and Practice

  5. Where we find our inspiration for practice... St Stuxnet, Flame, Target In Inc., Ne Neiman Marcus, Affin init ity Gaming, Dairy Queen... ● Et tu, Dair iry Queen!? This is is is when thin ings got ot personal. l... ● "Axiom" 1: Unle less some sensor, or coll llectio ion of sensors, is effected by an atta tack then ● you can't detect it it. I. I.e. eit ither the marginal or jo join int probabil ilit ity densit ity functio ion of the sensors must be ● dif ifferent in in a stati tistically ly meanin ingful way, condit ition oned on the absence or presence of an attack. "Axiom" 2: The most dangerous attacks are those for whic ich you don’t have a signature. ● Vir irus detectio ion and intrusio ion detectio ion systems (ID IDS) do a good jo job of of detectin ing ● atta tacks for whic ich a sig ignature is is known, but have ve noth othin ing to say if if the atta tack has no sig ignature "Theorem": Therefore the most dangerous attacks can only ly be detected by sensors ● whic ich were not ot desig igned to detect th that threat. You have ve to get t lucky and have a sensor th that detects ts the new att ttack eve ven though it it ● was not desig igned to do o so. "Corolla lary": You want lots of sensors! ● But, how ow do you fuse them? Eve ven once you have a way of fusin ing the data, how ow do ● you avo void id being ove verwhelmed with fals lse alarms!

  6. Advanced Persistent Threats Reconnaissance Pivoting Pivoting Pivoting Point of entry Command and control Botnet Pivoting Exfiltration

  7. What do we mean by a sensor? Attack Attack Sensor Response Time No attack No attack

  8. What do we mean by a sensor? Attack Attack Sensor Response Time No attack No attack

  9. What do we mean by a sensor? Attack Attack Sensor Response Time No attack No attack

  10. What kinds of sensors? ● Already talked about packet rates. ● Port, CPU, memory activity, etc. ● Intrusion Detection Systems ● Bro, Snort, Suricata, etc. ● More "complicated" sensors such as those inspired by information theory. ● Packet payload entropy Butun, Ismail, Salvatore D. Morgera, and Ravi Sankar. "A survey of intrusion detection systems in wireless sensor networks." Communications Surveys & Tutorials, IEEE 16.1 (2014): 266-282. Moosavi, M. R., et al. "ENTROPY BASED FUZZY RULE WEIGHTING FOR HIERARCHICAL INTRUSION DETECTION." Iranian Journal of Fuzzy Systems 11.3 (2014): 77-94.

  11. Best with an example!

  12. Data matrix

  13. First order anomaly

  14. Sparse correlations? Latent signal model... U V Y A B N

  15. A simple second order anomaly

  16. Second order theory! In our work we focused on analyzing the second order statistics of by way of its covariance or normalized cross correlation matrix , such as Interesting questions: ● Correlation versus covariance? ● More refined calculations such as Maximum Likelihood covariance Well defined for missing estimation (e.g. using convex data and different data optimization). types (e.g. point-biserial correlation).

  17. Second order anomaly

  18. Standing on the shoulders of giants ● Over the past 4-5 years there has been a flurry of activity on this problem, much of which we suspect the current audience is aware of. Eckart, C.; Young, G. (1936). "The Ideas such as matrix completion, robust principal component analysis, ● approximation of one matrix by another of and robust matrix completion have generated a lot of interest, including lower rank". Psychometrika 1 (3): 211–8. among us! Matrix completion: The Robust principal component analysis Netflix problem! E. Candes, X. Li, Y. Ma, and J. Wright, E.Candes and B.Recht, “Exact matrix Z. Zhou, X. Li, J. Wright, E. Cande`s, and Y. completion via convex optimization,” “Robust principal component analysis?,” J. Ma, “Stable Principal Component Pursuit,” Foundations of Computational ACM, vol. 58, pp. 11:1–11:37, June 2011. ISIT 2010: Proceedings of IEEE International Mathematics, vol. 9, pp. 717–772, Symposium on Information Technology, December 2009. 2010. What I am interested in :-) E. Candes and Y. Plan, “Matrix Completion With Noise,” Proceedings of the IEEE, vol.98, no.6, p.11, 2009 R. Paffenroth, P. Du Toit, R. Nong, L. Scharf, A. Jayasumana and V. Bandara Space-time signal processing for distributed pattern detection in sensor networks IEEE Journal of Selected Topics in Signal Processing, Vol. 7, No.1, February 2013 P. Du Toit, R. Paffenroth, R. Nong Stability of Principal Component Pursuit with Point-wise Error Constraints in preparation 2012.

  19. M L S = + Singular values

  20. The appropriate structures appear all over the place in real data! Elisa Rosales Singular Values of Matrices Insurance Satisfaction Surveys

  21. The appropriate structures appear all over the place in real data! Rakesh Biradar Singular Values of Matrices Amazon product communities SKAION Internet Attack (e.g., DDoS) simulations

  22. Abilene Internet2 Backbone

  23. Abilene Internet2 Backbone

  24. Abilene Internet2 Backbone

  25. Abilene Internet2 Backbone

  26. Abilene Internet2 Backbone

  27. Enough math for the moment, lets try a really practical example ● DARPA Lincoln Lab Intrusion Detection Evaluation Data Set ➢ IPsweep of the AFB from a remote site ➢ Probe of live IP's to look for the sadmind daemon running on Solaris hosts ➢ Breakins via the sadmind vulnerability, both successful and unsuccessful on those hosts ➢ Installation of the trojan mstream DDoS software on three hosts at the AFB ➢ Launching the DDoS https://www.ll.mit.edu/ideval/d ata/2000/LLS_DDOS_1.0.html

  28. Feature generation Raw PCAP files Derived features

  29. Imporant idea... don't blindly follow theory = + l L , S arg min L S 0 * 1 L , S 0 0 + - s . . t P ( L S ) P ( M ) W W 0 0

  30. Lincoln Labs DARPA Intrusion Detection Data Set - PCA ● IP sweep from a remote site, ● a probe of live IP addresses looking for a running Sadmind daemon, ● and then an exploitation of a Sadmind vulnerability.

  31. Lincoln Labs DARPA Intrusion Detection Data Set - Comparison RPCA – Too many PCA – Too many false positives false negatives

  32. Lincoln Labs DARPA Intrusion Detection Data Set - Comparison Too “thick” Too “thin” Just right l

  33. Key idea ● Semi-supervised learning ● PCA and RPCA have many parameters ● Far to many to train on reasonably sized collections of attacks ● Only train a few important parameters on supervised training data l – Like ● Gives better generalization and less over-fitting

  34. Key idea ● Semi-supervised learning Training data for l Algorithm not trained on this attack vector!

  35. Other fun problems: LANDER The LANDER project measures the number of “active” (i.e. respond to pings) on subnets across the Internet Number of responding hosts Same structure appears! Subnets in anomaly: [1, 210, 44, 0] [1, 210, 173, 0] Can be used to [1, 219, 34, 0] pick out all LG [1, 210, 206, 0] DACOM subnets [1, 218, 60, 0] [1, 218, 121, 0] in Europe. [1, 218, 173, 0] Test round number

  36. Other fun problems: CAIDA Here is a small section of the 1.1 petabyte (and growing) CAIDA data set. It contains measurements of Normalized latency the worldwide Internet connectivity and latency (traceroute). Same structure Time appears!

  37. Big Data Computer Science Math By Holger Motzkau 2010, Wikipedia/Wikimedia Commons (cc-by-sa-3.0), CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=11115505

  38. Equivalent formulation

  39. Big Data Original algorithm. Rank=2, probability of corruption=2%, observations=10m and new algorithm! ● R. Paffenroth, R. Nong, P. Du Toit, On covariance structure in noisy, big data. Proceedings Vol. 8857, Signal and Data Processing of Small Targets, October 2013, Oliver E. Drummond; Richard D. Teichgraeber, Editors.

  40. Big Data Hey, wait a minute...

  41. How can this be? Math helps... ) ) ( ( ) )) ( ( (

  42. How can this be? Implementation helps... Think about as distributed databases.

  43. Distributed databases. Ali Benamara

Recommend


More recommend