the unspoken problems with machine learning in security
play

The Unspoken Problems With Machine Learning in Security Noa Weiss - PowerPoint PPT Presentation

The Unspoken Problems With Machine Learning in Security Noa Weiss Hi! AI & Machine Learning Consultant Playing with data for over a decade Risk and Security PayPal, Armis 2 Hi! Deep Voice foundation Leader of


  1. The Unspoken Problems With Machine Learning in Security Noa Weiss

  2. Hi! AI & Machine Learning Consultant ● Playing with data for over a decade ● Risk and Security ● PayPal, Armis ● 2

  3. Hi! Deep Voice foundation ● Leader of Women in Data Science Israel ● Mentor junior data scientists ● 3

  4. Agenda ● Is the grass really greener? ○ ML - other domains ○ ML - security ● The things that hold us back ● Possible solutions 4

  5. Agenda ARE WHY WHAT we lagging behind is that the case can we do 5

  6. Agenda ARE WHY WHAT we lagging behind is that the case can we do 6

  7. ML IN OTHER DOMAINS: COMPUTER VISION

  8. Computer Vision Today ● Autonomous vehicles ● Facial recognition ● Generative AI 8

  9. COMPUTER VISION: EXAMPLES 9

  10. Image Completion Algorithm: Image-GPT 10

  11. 11

  12. Sketches → Photorealism Algorithm: GauGan 12

  13. 13

  14. 14

  15. 15

  16. Sketches → Photorealism Algorithm: GauGan Developed by Katherine Nicholls, PhD 16

  17. Fictional People www.thispersondoesnotexist.com 17

  18. Fictional People / Cats www.thiscatdoesnotexist.com 18

  19. Fictional Everything www.thispersondoesnotexist.com www.thiscatdoesnotexist.com www.thishorsedoesnotexist.com/ www.thisartworkdoesnotexist.com/ www.thischemicaldoesnotexist.com/ 19

  20. ML IN OTHER DOMAINS: NATURAL LANGUAGE PROCESSING (NLP)

  21. NLP Today ● Pretty good automatic translation ● Long-form question answering ● GPT-3 21

  22. NLP: EXAMPLES 22

  23. GPT-3 ● Language model (multi-purpose NLP model) ● Mostly generative ● Astonishing performance 23

  24. GPT-3: Generative Code ● Free description of layout → JSX code ● (No task-specific training) 24

  25. GPT-3: Generative Code ● Free description of ML model → model code! 25

  26. GPT-3: Coding Interview 26

  27. 27

  28. Google Duplex ● “Personal assistant” for phone reservations 28

  29. Google Duplex 29

  30. Security

  31. ML in Security Today The good stufg: ● Some significant improvements in malware detection ○ Next Generation Anti Virus (NGAV) ● Some promise for network intrusion detection ○ Not yet prominent in practice 31

  32. ML in Security Today ● All in all: ○ ML models with so-so performance ○ ML only makes for a small part of core product ○ Data and ML technology under-utilized ● Lagging behind other domains 32

  33. Agenda ARE WHY WHAT we lagging behind is that the case can we do 33

  34. WHY?

  35. Anomaly Detection Algorithms Algorithms aimed at identifying data points, events, or observations that deviate from a dataset's normal ● Very common in Security ○ Algorithm task fits business needs ○ Unsupervised (no labels needed) 35

  36. Anomaly Detection Algorithms Yet, not ideal for Security: ● High false positive rate (FPR) ○ Legitimate user activity is often anomalous ○ Higher cost of errors than other domains ■ (Block legit activity? Wait for manual review?) ● Human-designed features are our “Ground Truth” ○ Very prone to human bias ○ Model only spots MOs we already know 36

  37. Changing Environment ● Most ML domains: mostly unchanging environment ○ E.g.: CV, NLP ● Environment in Security: ○ New devices ○ New apps ○ New protocols ○ Etc. ● This is a problem for a learning model 37

  38. An Adapting Adversary ● As we become better at securing our devices and networks, attackers become better at outsmarting our defences ● This is a problem uncommon in most fields ○ E.g.: CV, NLP 38

  39. Tagging ● How CV and NLP get tagged datasets ● Why we can’t do that in security ○ Expertise ○ Context ○ Confidentiality ○ Scale ● Bigger datasets = bigger tagging problems ○ Sampling? 39

  40. Imbalanced Classes Difgerent classes are extremely over/under represented in the data ● Results in poor predictive performance (especially for minority class) 40

  41. Imbalanced Classes A major problem when aiming to identify ● fraud/attacks While common solutions exist, they are ● limited, and do not fully solve this problem 41

  42. Need for Explainability ● CV / NLP: mostly based on deep learning techniques ● Deep learning models are considered “black boxes” ● Security decision-making requires explainability (more so than other domains) ● DL could still be used with added-on explainability models - but those are imperfect, and complex 42

  43. Confidentiality ● Other domains: ○ Public datasets ○ Public baselines ○ Publicly-released trained models ● All of those enable not only direct collaboration, but also a way to compare new methods and algorithms 43

  44. Confidentiality Security: ● Companies bound by confidentiality ● No natively public data available Few publicly available datasets - small / outdated 44

  45. Many researchers are struggling to find comprehensive and valid datasets to test and evaluate their proposed techniques and having a suitable dataset is a significant challenge in itself. Ferrag, M. A., Maglaras, L., Moschoyiannis, S., & Janicke, H. (2020) 45

  46. In order to test the effjciency of such mechanisms, reliable datasets are needed that (i) contain both benign and several attacks, (ii) meet real world criteria, and (iii) are publicly available. Ferrag, M. A., Maglaras, L., Moschoyiannis, S., & Janicke, H. (2020) 46

  47. Agenda ARE WHY WHAT we lagging behind is that the case can we do 47

  48. WHAT CAN WE DO TO CHANGE THIS?

  49. 1 . Public Datasets

  50. 2 . Benchmarks

  51. 3 . Direct Collaboration

  52. Public Datasets Benchmarks Direct Collaboration Encourage an active discussion & indirect collaboration, in the public domain, resulting in faster, better progress for the security domain as a whole.

  53. Wrap Up ARE WHY WHAT we lagging behind is that the case can we do 53

  54. Thank you hi@weissnoa.com @NWeiss linkedin.com/in/noa-weiss www.weissnoa.com Presentation template by SlidesCarnival

Recommend


More recommend