Detecting Botnets with Temporal Persistence Jaideep Chandrashekar Frederic Giroire Nina Taft Eve Schooler Mascotte project I3S (CNRS, Univ. of Nice) Intel Labs INRIA Sophia Antipolis
Botnets: Why care?
Botnet Life-Cycle spam IRC DoS dirty webpage HTTP clickfraud drive by download proxies theft P2P trojans espionage hybrid call exploit actuation home
Botnet Life-Cycle spam IRC DoS dirty webpage HTTP clickfraud drive by download proxies theft P2P trojans espionage hybrid call exploit exploit actuation actuation home signature matching software patching
Botnet Life-Cycle spam IRC DoS dirty webpage HTTP clickfraud drive by download proxies theft P2P trojans espionage hybrid call exploit exploit actuation actuation home signature matching software patching see RAID’09 proceedings
Botnet Life-Cycle spam IRC DoS dirty webpage HTTP clickfraud drive by download proxies theft P2P trojans espionage hybrid call exploit exploit actuation actuation home signature matching software patching traffic anomaly detectors NBAD see RAID’09 proceedings
Botnet Life-Cycle spam IRC DoS dirty webpage HTTP clickfraud drive by download proxies theft P2P trojans espionage hybrid call exploit exploit actuation actuation home traffic correlation signature matching port inspection software patching traffic anomaly detectors payload analysis NBAD noisy see RAID’09 prone to false positives proceedings
Botnet Life-Cycle spam IRC DoS dirty webpage HTTP clickfraud drive by download proxies theft P2P trojans espionage hybrid call exploit exploit actuation actuation home traffic correlation signature matching port inspection software patching traffic anomaly detectors payload analysis NBAD noisy see RAID’09 hard to adapt prone to false positives proceedings a-priori knowledge req.
Botnets C&C invariants • Botmasters seldom try to connect to the drones: • drones initiate the (rondezvous) connections • Drones need to call home often: • (if not) drone falls off the radar
Botnets C&C invariants • Botmasters seldom try to connect to the drones: • drones initiate the (rondezvous) connections watch outgoing traffic • Drones need to call home often: • (if not) drone falls off the radar use a frequency based metric
Our Solution: Canary A general purpose, non- specific learning based behavioral detector to uncover botnet C&C destinations at the end-host without a-priori assumptions about traffic types, destinations or protocols
Our Solution: Canary A general purpose, non- specific learning based behavioral detector to uncover botnet C&C destinations at the end-host without a-priori assumptions about traffic types, destinations or protocols
High Level Method Training Day1 Day2 Day3 Day4 Day5 .. .. DayN ..
High Level Method watch destinations whitelist frequent destinations Training Day1 Day2 Day3 Day4 Day5 .. .. DayN ..
High Level Method watch destinations whitelist frequent destinations whitelist Detection params Training Day1 Day2 Day3 Day4 Day5 .. .. DayN ..
High Level Method ignore whitelisted destinations track frequency for non-whitelisted destinations raise alarm for new high frequency destinations watch destinations whitelist frequent destinations whitelist Detection params Training Day1 Day2 Day3 Day4 Day5 .. .. DayN ..
High Level Method Botnet C&C’s are likely to be frequently visited ignore whitelisted destinations track frequency for non-whitelisted destinations raise alarm for new high frequency destinations watch destinations whitelist frequent destinations whitelist Detection params Training Day1 Day2 Day3 Day4 Day5 .. .. DayN ..
High Level Method adding to the whitelist is Botnet C&C’s are likely to a very rare event be frequently visited ignore whitelisted destinations track frequency for non-whitelisted destinations raise alarm for new high frequency destinations watch destinations whitelist frequent destinations whitelist Detection params Training Day1 Day2 Day3 Day4 Day5 .. .. DayN ..
Remaining Detail Destination granularity: tracking IP address = large whitelists! ➡ destination atoms (Frequency) metric needs to capture: 10 loosely periodic behavior at unknown 9 8 7 timescales 6 5 4 3 2 1 ➡ persistence 0 0 1 2 3 4 5 6 7 8 9 10 We track persistence of destination atoms & build whitelists of destination atoms
Destination Atoms mail1.sc.intel.com mail3.sc.intel.com mail3.jf.intel.com circuit.intel.com cps.circuit.intel.com xyz.google.com abs.google.com
Destination Atoms mail1.sc.intel.com mail.intel.com mail3.sc.intel.com mail3.jf.intel.com circuit.intel.com circuit.intel.com cps.circuit.intel.com google.com xyz.google.com abs.google.com
Persistence sliding window 1 1 0 0 0 1 0 1 1 0 1 1 1 1
Persistence sliding window w 1 1 0 0 0 1 0 1 1 0 1 1 1 1 W
Persistence sliding window w 3/7 1 1 0 0 0 1 0 1 1 0 1 1 1 1 W
Persistence sliding window w 3/7 1 1 0 0 0 1 0 6/7 1 1 0 1 1 1 1 W
Picking Timescale suppose: w= 1hr, W=24hr Botnet X Botnet Y Botnet Z connect to C&C once a connect to C&C connect to C&C day hourly every 5-6 hours p-value: 1/24 =0.042 p-value: 24/24 =1 p-value: 4/24 = 0.17
Picking Timescale suppose: w= 1hr, W=24hr Botnet X Botnet Y Botnet Z connect to C&C once a connect to C&C connect to C&C day hourly every 5-6 hours p-value: 1/24 =0.042 p-value: 24/24 =1 p-value: 4/24 = 0.17 Cannot assume a single, fixed timescale!!!!
Selecting Timescale(s) • Select n overlapping timescales TS 1 =(w 1 ,W 1 ), TS 2 =(w 2 ,W 2 ), TS 3 =(w 3 ,W 3 ),...., TS n =(w n ,W n ) • p i := persistence of atom for TS i =(w i ,W i ) • track p i concurrently for all the timescales • p(atom) := max i p i (atom)
Selecting Timescale(s) • Select n overlapping timescales TS 1 =(w 1 ,W 1 ), TS 2 =(w 2 ,W 2 ), TS 3 =(w 3 ,W 3 ),...., TS n =(w n ,W n ) • p i := persistence of atom for TS i =(w i ,W i ) • track p i concurrently for all the timescales • p(atom) := max i p i (atom) Can become very expensive! Trick: select W i = k. w i Then, use a single bitmap of size k.w max
Dataset: Training • Normal user traces collected from 157 end-hosts for 4 weeks • Data collected on end-hosts • winpcap + wrapper code • traces assumed clean (some suspicious traffic observed: ground truth not available) • Initial 2 weeks of data used for training • pick threshold for persistence • construct per user whitelists
Picking threshold(s) 1 → seems reasonable 0.9 0.8 0.7 # of atoms 0.6 80 % of destinations 0.5 have a p-value <0.2 0.4 20% of destinations have a p-value > 0.2 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 persistence if p(atom) > 0.6 add to whitelist
Whitelist Sizes 25 # users 12.5 0 40 70 90 110 125 145 whitelist size 14
Validation • Started with 55 distinct malware binaries windows auto update • 27 had traffic; 12 had traffic for longer than 1 day • Ran each malware for 1 week; all traffic logged • Packet traces ➜ flow traces [Bro] • Flow traces manually analyzed to isolate C&C traffic malware
ClamAV Signature C&C type # of C&C atoms C&C Volume min - max Trojan.Aimbot-25 port 22 1 0-5.7 Trojan.Wootbot-247 IRC port 12347 4 0-6.8 Trojan.Gobot.T IRC port 66659 1 0.2-2.1 Trojan.Codbot-14 IRC port 6667 2 0-9.2 Trojan.Aimbot-5 IRC via http proxy 3 0-10 Trojan.IRCBot-776* HTTP 16 0-1. Trojan.VB-666* IRC port 6667 1 0-1.3 Trojan.IRC-Script-50 IRC ports 6662-6669,9999,7000 8 0-2.1 8 Trojan.Spybot-248 port 9305 4 3.8-4.6 Trojan.MyBot-8926 IRC port 7007 1 0-0.1 Trojan.IRC.Zapchast-11 IRC ports 6666, 6667 9 0-1 Trojan.Peed-69 [Storm] P2P/Overnet 19672 0-30 Converted packet traces to flow traces and hand analyzed each trace individually to identify/isolate C&C traffic to identify/isolate attack traffic
3 Detailed Examples • SDBot • 2 atoms in covert channel- identified by IRC server names • attack traffic- scans on ports 135, 139, 445 & 2097* • Zapchast • 9 atoms in covert channel- popular IRC ports • attack traffic- netbios(?) • Storm/Peacomm • ~82,000 atoms (almost all atoms are singletons) • no well known port/address for C&C destinations • attack traffic is SMTP (overwhelmingly), and possibly some http & ssh
Connection Rates (per min) C&C Attack 150.0 112.5 75.0 37.5 SDBot 0 ZapChast Storm
C&C Detection 24 hr SDBot 18 hr 12 hr Storm 6 hr ZapChast 1 hr 0.0 0.2 0.4 0.6 0.8 1.0 persistence
Recommend
More recommend