needles in a haystack mining information from public
play

NEEDLES IN A HAYSTACK: MINING INFORMATION FROM PUBLIC DYNAMIC - PowerPoint PPT Presentation

NEEDLES IN A HAYSTACK: MINING INFORMATION FROM PUBLIC DYNAMIC SANDBOXES FOR MALWARE INTELLIGENCE Mariano Graziano, Davide Canali, Leyla Bilge, Andrea Lanzi and Davide Balzarotti Eurecom Symantec Research Labs Universit degli Studi di


  1. NEEDLES IN A HAYSTACK: MINING INFORMATION FROM PUBLIC DYNAMIC SANDBOXES FOR MALWARE INTELLIGENCE Mariano Graziano, Davide Canali, Leyla Bilge, Andrea Lanzi and Davide Balzarotti Eurecom Symantec Research Labs Università degli Studi di Milano USENIX Security ’15 - Washington DC, USA

  2. A ¡PILE ¡OF ¡MALWARE ¡SAMPLES

  3. CAMPAIGN TIME BEFORE PUBLIC DISCLOSURE SUBMITTED BY Operation Aurora 4 months US Red October 8 months Romania APT1 43 months US Stuxnet 1 month US Beebus 22 months Germany LuckyCat 3 months US BrutePOS 5 months France NetTraveller 14 months US Pacific PluX 12 months US Pitty Tiger 42 months US Regin 44 months UK Equation 23 months US

  4. Constant interaction criminals vs sandbox

  5. GOAL ‣ Observation: Malware authors use public sandboxes to test their developments ‣ Design data mining techniques to automatically discover malware developments

  6. SYSTEM ¡OVERVIEW

  7. SYSTEM ¡OVERVIEW

  8. DATA ¡REDUCTION 32M Initial Dataset

  9. DATA ¡REDUCTION 6M Submitted by regular users

  10. DATA ¡REDUCTION 522K Not already part of large submissions

  11. DATA ¡REDUCTION 214K Previously unknown by Symantec & VirusTotal

  12. DATA ¡REDUCTION 121K Final (not packed binary)

  13. SYSTEM ¡OVERVIEW

  14. CLUSTERING ‣ Agglomerative clustering ( similarity threshold: 70% ): ‣ Binary similarity ( ssdeep ) ‣ Submissions metadata ‣ Sliding window of seven days: ‣ Reduce comparisons ‣ Ensure binary similarity ‣ 5972 clusters 4.5 elements each

  15. SYSTEM ¡OVERVIEW

  16. FINE-­‑GRAINED ¡ANALYSIS 92% 74% 87% ‣ Binary code normalisation ‣ Call graph comparison [Flake04,Gao08] ‣ Control flow graph comparison [Flake04,Kruegel06,Jang13]

  17. SYSTEM ¡OVERVIEW

  18. FEATURE ¡EXTRACTION ‣ Comprise two phases: ‣ Per sample (25 features in 6 groups) ‣ Per cluster (48 features in 5 groups)

  19. SAMPLE ¡FEATURES

  20. CLUSTER ¡FEATURES

  21. CLUSTER ¡FEATURES

  22. CLUSTER ¡FEATURES UNKOWN UNKOWN MALICIOUS

  23. CLUSTER ¡FEATURES COMPLEX BEHAVIOR COMPLEX BEHAVIOR NO BEHAVIOR

  24. SYSTEM ¡OVERVIEW

  25. MACHINE ¡LEARNING ‣ Logistic Model Tree (LMT) ‣ Training Set (157 clusters): ‣ Non development: 91 clusters ‣ Development: 66 clusters

  26. RESULTS ‣ 3038 potential development clusters ‣ 1474 malicious clusters ‣ 135 days on average for the detection ‣ Thousands of computers infected in 13 countries CLUSTERS TYPE 1082 Trojans 83 Backdoors 65 Worms 45 Botnets 21 Tools 4 Keyloggers

  27. EXAMPLES

  28. ANTI-­‑SANDBOX s n n Sample 1 Sample 1 Sample 2 Sample 3 u w w o o o i c n n 17:06:06 17:14:16 16:59:33 i k k l a Submission time n n M U U Compile time t 16:59:13 17:05:21 17:13:26

  29. ANTI-­‑SANDBOX s n n Sample 1 Sample 1 Sample 2 Sample 3 u w w o o o i c n n 17:06:06 17:14:16 16:59:33 i k k l a Submission time n n M U U Compile time t 16:59:13 17:05:21 17:13:26

  30. TROJAN ¡DROPPER 22:35:08 00:44:06 01:18:48 01:25:16 13:07:26 SUBMISSION TIME DELPHI VB COMPILE TIME 1992-06-20 1992-06-20 1992-06-20 1992-06-20 2008-10-04

  31. TROJAN ¡DROPPER 22:35:08 00:44:06 01:18:48 01:25:16 13:07:26 SUBMISSION TIME DELPHI VB COMPILE TIME 1992-06-20 1992-06-20 1992-06-20 1992-06-20 2008-10-04 ‣ VirusTotal: 37/50 ( trojan dropper ) ‣ Two IP addresses: ‣ Dynamic DNS service (no-ip) ‣ Connect-back behavior overall 1817 clusters

  32. LIMITATIONS ‣ No packed binaries ‣ Evasions: ‣ Sandbox interaction still required to develop evasion techniques ‣ Most sophisticated analysis techniques require to link a probe to the final malware

  33. CONCLUSION

  34. THE ¡END THANK YOU graziano@eurecom.fr magrazia@cisco.com @emd3l

Recommend


More recommend