detecting botnets with temporal persistence
play

Detecting Botnets with Temporal Persistence Jaideep Chandrashekar - PowerPoint PPT Presentation

Detecting Botnets with Temporal Persistence Jaideep Chandrashekar Frederic Giroire Nina Taft Eve Schooler Mascotte project I3S (CNRS, Univ. of Nice) Intel Labs INRIA Sophia Antipolis Botnets: Why care? Botnet


  1. Detecting Botnets with Temporal Persistence Jaideep Chandrashekar Frederic Giroire Nina Taft Eve Schooler Mascotte project I3S (CNRS, Univ. of Nice) Intel Labs INRIA Sophia Antipolis

  2. Botnets: Why care?

  3. Botnet Life-Cycle spam IRC DoS dirty webpage HTTP clickfraud drive by download proxies theft P2P trojans espionage hybrid call exploit actuation home

  4. Botnet Life-Cycle spam IRC DoS dirty webpage HTTP clickfraud drive by download proxies theft P2P trojans espionage hybrid call exploit exploit actuation actuation home signature matching software patching

  5. Botnet Life-Cycle spam IRC DoS dirty webpage HTTP clickfraud drive by download proxies theft P2P trojans espionage hybrid call exploit exploit actuation actuation home signature matching software patching see RAID’09 proceedings

  6. Botnet Life-Cycle spam IRC DoS dirty webpage HTTP clickfraud drive by download proxies theft P2P trojans espionage hybrid call exploit exploit actuation actuation home signature matching software patching traffic anomaly detectors NBAD see RAID’09 proceedings

  7. Botnet Life-Cycle spam IRC DoS dirty webpage HTTP clickfraud drive by download proxies theft P2P trojans espionage hybrid call exploit exploit actuation actuation home traffic correlation signature matching port inspection software patching traffic anomaly detectors payload analysis NBAD noisy see RAID’09 prone to false positives proceedings

  8. Botnet Life-Cycle spam IRC DoS dirty webpage HTTP clickfraud drive by download proxies theft P2P trojans espionage hybrid call exploit exploit actuation actuation home traffic correlation signature matching port inspection software patching traffic anomaly detectors payload analysis NBAD noisy see RAID’09 hard to adapt prone to false positives proceedings a-priori knowledge req.

  9. Botnets C&C invariants • Botmasters seldom try to connect to the drones: • drones initiate the (rondezvous) connections • Drones need to call home often: • (if not) drone falls off the radar

  10. Botnets C&C invariants • Botmasters seldom try to connect to the drones: • drones initiate the (rondezvous) connections watch outgoing traffic • Drones need to call home often: • (if not) drone falls off the radar use a frequency based metric

  11. Our Solution: Canary A general purpose, non- specific learning based behavioral detector to uncover botnet C&C destinations at the end-host without a-priori assumptions about traffic types, destinations or protocols

  12. Our Solution: Canary A general purpose, non- specific learning based behavioral detector to uncover botnet C&C destinations at the end-host without a-priori assumptions about traffic types, destinations or protocols

  13. High Level Method Training Day1 Day2 Day3 Day4 Day5 .. .. DayN ..

  14. High Level Method watch destinations whitelist frequent destinations Training Day1 Day2 Day3 Day4 Day5 .. .. DayN ..

  15. High Level Method watch destinations whitelist frequent destinations whitelist Detection params Training Day1 Day2 Day3 Day4 Day5 .. .. DayN ..

  16. High Level Method ignore whitelisted destinations track frequency for non-whitelisted destinations raise alarm for new high frequency destinations watch destinations whitelist frequent destinations whitelist Detection params Training Day1 Day2 Day3 Day4 Day5 .. .. DayN ..

  17. High Level Method Botnet C&C’s are likely to be frequently visited ignore whitelisted destinations track frequency for non-whitelisted destinations raise alarm for new high frequency destinations watch destinations whitelist frequent destinations whitelist Detection params Training Day1 Day2 Day3 Day4 Day5 .. .. DayN ..

  18. High Level Method adding to the whitelist is Botnet C&C’s are likely to a very rare event be frequently visited ignore whitelisted destinations track frequency for non-whitelisted destinations raise alarm for new high frequency destinations watch destinations whitelist frequent destinations whitelist Detection params Training Day1 Day2 Day3 Day4 Day5 .. .. DayN ..

  19. Remaining Detail Destination granularity: tracking IP address = large whitelists! ➡ destination atoms (Frequency) metric needs to capture: 10 loosely periodic behavior at unknown 9 8 7 timescales 6 5 4 3 2 1 ➡ persistence 0 0 1 2 3 4 5 6 7 8 9 10 We track persistence of destination atoms & build whitelists of destination atoms

  20. Destination Atoms mail1.sc.intel.com mail3.sc.intel.com mail3.jf.intel.com circuit.intel.com cps.circuit.intel.com xyz.google.com abs.google.com

  21. Destination Atoms mail1.sc.intel.com mail.intel.com mail3.sc.intel.com mail3.jf.intel.com circuit.intel.com circuit.intel.com cps.circuit.intel.com google.com xyz.google.com abs.google.com

  22. Persistence sliding window 1 1 0 0 0 1 0 1 1 0 1 1 1 1

  23. Persistence sliding window w 1 1 0 0 0 1 0 1 1 0 1 1 1 1 W

  24. Persistence sliding window w 3/7 1 1 0 0 0 1 0 1 1 0 1 1 1 1 W

  25. Persistence sliding window w 3/7 1 1 0 0 0 1 0 6/7 1 1 0 1 1 1 1 W

  26. Picking Timescale suppose: w= 1hr, W=24hr Botnet X Botnet Y Botnet Z connect to C&C once a connect to C&C connect to C&C day hourly every 5-6 hours p-value: 1/24 =0.042 p-value: 24/24 =1 p-value: 4/24 = 0.17

  27. Picking Timescale suppose: w= 1hr, W=24hr Botnet X Botnet Y Botnet Z connect to C&C once a connect to C&C connect to C&C day hourly every 5-6 hours p-value: 1/24 =0.042 p-value: 24/24 =1 p-value: 4/24 = 0.17 Cannot assume a single, fixed timescale!!!!

  28. Selecting Timescale(s) • Select n overlapping timescales TS 1 =(w 1 ,W 1 ), TS 2 =(w 2 ,W 2 ), TS 3 =(w 3 ,W 3 ),...., TS n =(w n ,W n ) • p i := persistence of atom for TS i =(w i ,W i ) • track p i concurrently for all the timescales • p(atom) := max i p i (atom)

  29. Selecting Timescale(s) • Select n overlapping timescales TS 1 =(w 1 ,W 1 ), TS 2 =(w 2 ,W 2 ), TS 3 =(w 3 ,W 3 ),...., TS n =(w n ,W n ) • p i := persistence of atom for TS i =(w i ,W i ) • track p i concurrently for all the timescales • p(atom) := max i p i (atom) Can become very expensive! Trick: select W i = k. w i Then, use a single bitmap of size k.w max

  30. Dataset: Training • Normal user traces collected from 157 end-hosts for 4 weeks • Data collected on end-hosts • winpcap + wrapper code • traces assumed clean (some suspicious traffic observed: ground truth not available) • Initial 2 weeks of data used for training • pick threshold for persistence • construct per user whitelists

  31. Picking threshold(s) 1 → seems reasonable 0.9 0.8 0.7 # of atoms 0.6 80 % of destinations 0.5 have a p-value <0.2 0.4 20% of destinations have a p-value > 0.2 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 persistence if p(atom) > 0.6 add to whitelist

  32. Whitelist Sizes 25 # users 12.5 0 40 70 90 110 125 145 whitelist size 14

  33. Validation • Started with 55 distinct malware binaries windows auto update • 27 had traffic; 12 had traffic for longer than 1 day • Ran each malware for 1 week; all traffic logged • Packet traces ➜ flow traces [Bro] • Flow traces manually analyzed to isolate C&C traffic malware

  34. ClamAV Signature C&C type # of C&C atoms C&C Volume min - max Trojan.Aimbot-25 port 22 1 0-5.7 Trojan.Wootbot-247 IRC port 12347 4 0-6.8 Trojan.Gobot.T IRC port 66659 1 0.2-2.1 Trojan.Codbot-14 IRC port 6667 2 0-9.2 Trojan.Aimbot-5 IRC via http proxy 3 0-10 Trojan.IRCBot-776* HTTP 16 0-1. Trojan.VB-666* IRC port 6667 1 0-1.3 Trojan.IRC-Script-50 IRC ports 6662-6669,9999,7000 8 0-2.1 8 Trojan.Spybot-248 port 9305 4 3.8-4.6 Trojan.MyBot-8926 IRC port 7007 1 0-0.1 Trojan.IRC.Zapchast-11 IRC ports 6666, 6667 9 0-1 Trojan.Peed-69 [Storm] P2P/Overnet 19672 0-30 Converted packet traces to flow traces and hand analyzed each trace individually to identify/isolate C&C traffic to identify/isolate attack traffic

  35. 3 Detailed Examples • SDBot • 2 atoms in covert channel- identified by IRC server names • attack traffic- scans on ports 135, 139, 445 & 2097* • Zapchast • 9 atoms in covert channel- popular IRC ports • attack traffic- netbios(?) • Storm/Peacomm • ~82,000 atoms (almost all atoms are singletons) • no well known port/address for C&C destinations • attack traffic is SMTP (overwhelmingly), and possibly some http & ssh

  36. Connection Rates (per min) C&C Attack 150.0 112.5 75.0 37.5 SDBot 0 ZapChast Storm

  37. C&C Detection 24 hr SDBot 18 hr 12 hr Storm 6 hr ZapChast 1 hr 0.0 0.2 0.4 0.6 0.8 1.0 persistence

Recommend


More recommend