computational forensics
play

Computational Forensics: Machine Learning and Predictive Analytics - PowerPoint PPT Presentation

Fundamentals of Computational Forensics: Machine Learning and Predictive Analytics Carl Stuart Leichter PhD carl.leichter@ntnu.no NTNU Testimon Digital Forensics Group NTNU Testimon Digital Forensics Group Cyber Threat Intelligence and


  1. (Internal) Model Complexity 47 MC-0

  2. 0 th Order Polynomial Regression estimated model MC-2 48

  3. 1 st Order Polynomial 49 MC-2

  4. 3 rd Order 3 3 50 MC-3

  5. 9 th Order What Happened?! 51 MC-6

  6. Model Complexity • Curse of Dimensionality (Too Much Complexity) • Overfitting 52 MC-7

  7. Training Performance Evaluation 53 MC-8

  8. The Machine Learning Process Output Evaluation Feature Learning/Adaptation Preprocessing Training Data Extraction/Selection Internal Model Feature Classification/ Preprocessing Testing Data Extraction/Selection Regression Application T&T-1 54

  9. Training Data, Testing Data & Over-fitting 55 MC-9

  10. A Central Principle in ML • The model complexity drives the training data requirements! 56 MC-10

  11. More Data Can Fix Overfitting Problem • N= 10 Data Points • N= 15 Data Points • N= 100 Data Points 57 MC-11

  12. Curse of Dimensionality (Model Complexity) 58 MC-12

  13. • More complex problems, require more complex models • More complex models, require more complex feature spaces – Need higher dimensionality to get good class separation Wood classifier with 1D feature space? Grain Prominence Wood Brightness 59 MC-13

  14. Distance Metrics 60 DM-0

  15. The Distance Metric • How the similarity of two elements in a set is determined, e.g. – Euclidean Distance – Inner Product (Vector Spaces) – Manhattan Distance – Maximum Norm – Mahalanobis Distance – Hamming Distance – Or any metric you define over the space … 61 DM-1

  16. Manhattan Distance https://www.quora.com/What-is-the-difference-between-Manhattan-and- Euclidean-distance-measures 62 DM-2

  17. Far From Normal? y X X X X X X X X X X X X X X X X X X X X X x Center = Mean Spread = Variance 63 DM-3

  18. Mahalanobis Distance http://www.jennessent.com/arcview/mahalanobis_description.htm 64 DM-4

  19. Mahalanobis Distance http://stats.stackexchange.com/questions/62092/bottom-to-top- explanation-of-the-mahalanobis-distance 65 DM-5

  20. Unsupervised Learning 66 U-0

  21. Clustering • Partitional • Hierarchical 67 U-C-1

  22. Anomaly Detection with Unlabelled Data Packet Size X X X X X X X X X X X X X X X X X X X X X Packet Data Size 68 U-C-1

  23. Recap of Wood Classification – 2 Optical Attributes or Features • Brightness • Grain prominence – Yielded a 2-Dimensional Feature Space – We had SUPERVISED learning: • We started with known pieces of wood • Gave each plotted training example its class LABEL – We chose our features well, we saw good clustering/separation of the different classes in the features space. 69 U-C-2

  24. Unlabelled Data Brightness 10 X X X X X X X X X X X X X X X Grain Prominence 0 1 70 U-C-3

  25. Partitional Clustering U-C-3 71

  26. Hierarchical Clustering: Corpus browsing www.yahoo.com/Science … (30) agriculture biology physics CS space ... ... ... ... ... dairy botany cell AI courses crops craft magnetism HCI missions agronomy evolution forestry relativity U-C-3

  27. Essentials of Clustering • Similarities – Natural Associations – Proximate* • Differences – Distant* *Implies a distance metric 73 U-C-3

  28. Essentials of Clustering • What is a “Good” Cluster? –Members are very “similar” to each other • Within Cluster Divergence Metric σ i – Variance also works • Relative Cluster Sizes versus Data Spread 74 U-C-4

  29. Partitional Clustering Methods • K-Means Clustering • Gaussian Mixture Models • Canopy Clustering • Vector Quantization 75 U-C-5

  30. Unsupervised Learning/Clustering Self Organizing Maps (SOM) 76 U-C-7

  31. SOMs Topology Preserving Projections http://www.cita.utoronto.ca/~murray/GLG130/Exercises/F2.gif 77 U-C-8

  32. http://www.cita.utoronto.ca/~murray/GLG130/Exercises/F2.gif 78 U-C-9

  33. Topology Preserving Projections http://www.cita.utoronto.ca/~murray/GLG130/Exercises/F2.gif 79 U-C-10

  34. Topology Preserving Projections • How will the distance metric handle polymorphous data? – Units of time (different units of time?) • Sprint performance data: years of age and seconds to finish – Units of space • (meters, lightyears) • Surface area • Volumetric – Units of mass (grams, kilograms, tonnes) – Units of $$$ • NOK • USD 80 U-C-11

  35. Proximity By Colour and Location Poverty Map of the World (1997) http://www.cis.hut.fi/research/som-research/worldmap.html 81 U-C-12

  36. Map of Labels in Titles From comp.ai.neural-nets-news newsgroup www.cs.hmc.edu/courses/2003/ fall/cs152/slides/som.pdf 82 U-C-13

  37. Learning As Search 83 LAS-0

  38. • Exhaustive search – DFS – BFS • Gradient search – Can Get Stuck in Local Optimal Solution • Simulated annealing – Avoids Local Optima • Genetic algorithms 84 LAS-1

  39. Exact vs Approximate Search • Exact: – Hashing techniques – S tring matching (“Murder”) • Approximate: – Approximate Hashing – Partial strings – Elastic Search • “murder” • “ merder ” 85 LAS-7

  40. Artificial Neural Networks (ANN) 86 ANN-0

  41. Inspired by Natural Neural Nets 87 ANN-1

  42. Perceptron (1950s) 88 ANN-2

  43. Perceptron Can Learn Simple Boolean Logic Single Boundary, Linearly Separable 89 ANN-03

  44. Perceptron Cannot Learn XOR 90 ANN-4

  45. Multi-Layer Perceptron Error Back-Propagation Network MLP-BP 91 ANN-5

  46. MLP-BP Internal Model Building Block 5 MLP-BP Neurons 92 ANN-7

  47. MLP- BP “Universal Voxel” 93 ANN-8

  48. NeuroFuzzy Methods 94 NF-0

  49. Neuro Fuzzy Overview • Neuro-Fuzzy (NF) is a hybrid intelligence / soft computing – (*Soft?) • A combination of Artificial Neural NetworkS (ANN) and Fuzzy Logic (FL) • Opposite of fuzzy logic is – Crisp – Sharp • ANN are black box statistics, modelled to simulate the activity of biological neurons • FL extracts human-explainable linguistic fuzzy rules • Applications in Decision Support Systems and Expert Systems 95 NF-1

  50. Fuzzy Basics • FL uses linguistic variables that can contains several linguistic terms • Temperature (linguistic variable) – Hot (linguistic terms) – Warm – Cold • Consistency (linguistic variable) – Watery (linguistic terms) – Gooey – Soft – Firm – Hard – Crunchy – Crispy 96 NF-2

  51. Triangular Fuzzy Membership Functions http://sci2s.ugr.es/keel/links.php 97 NF-3

  52. 98 Fuzzy Inference ● Sharp antecedent: “If the tomato is red, then it is sweet” ● Fuzzy antecedent: ● “If the piece of wood is more or less dark ( μ dark = 0.7 )” ● Fuzzy consequent(s): ● “The piece of is more of less pine ( μ pine = 0.64 )” ● “The piece of is more of less birch ( μ birch = 0.36 )” http://ispac.diet.uniroma1.it/scarpiniti/files/NNs/Less9.pdf NF-4

  53. Combining ANN/FL ● ANN black box approach requires sufficient data to find the structure (generalization learning) ● NO PRIORS required ● But cannot extract linguistically meaningful rules from trained ANN ● Fuzzy rules require prior knowledge ● Based on linguistically meaningful rules http://www.scholarpedia.org/article/Fuzzy_neural_network 99 NF-5

  54. Combining ANN/FL Combining the two gives us higher level of system ● intelligence Intelligence(?) ● Can handle the usual ML tasks ● (regression, classification, etc) ● http://www.scholarpedia.org/article/Fuzzy_neural_network 100 NF-6

Recommend


More recommend