Challenges in Experimenting with Botnet Detection Systems Adam J. Aviv Andreas Haeberlen University of Pennsylvania August 8th, 2011 CSET-2011 1
Alice has developed a new botnet detector!!! What should the evaluation show? Alice's Detector August 8th, 2011 CSET-2011 2
Ideal Alice deploys her detector live on her local network Alice is provided with a list of hosts that are botnet infected Alice deploys her detector on various other networks Academic, Residential, Corporate, etc. Alice records traces of each deployment Improve detector in the lab Readily available to other researchers August 8th, 2011 CSET-2011 3
Realities Production-ready deployment? Ground truth of botnet infections? Deployment on various networks? Record trace and replay experiment? Traces available to other researchers? August 8th, 2011 CSET-2011 4
T aking a Step Back August 8th, 2011 CSET-2011 5
Many Challenges Multiple Administrative Focus on Academic Domains Networks Network Heterogeneity Scale Multimorbidity Mixing Artifacts Privacy False Postives & Negatives Controlled Environments Artifact Overfitting Repeatability Botnet Overfitting Comparability Lack of Verification August 8th, 2011 CSET-2011 6
privacy August 8th, 2011 CSET-2011 7
We have to worry about privacy, but the botnet authors don't! August 8th, 2011 CSET-2011 8
Can we do better together? August 8th, 2011 CSET-2011 9
Discussion/T opics/Questions Experimental Ideals vs. Realities Not just botnet detectors ... Raw Materials of the Experiment Sharing and Obtaining Traces Botnet and Background Traces Can we do better via collaboration? August 8th, 2011 CSET-2011 10
Presentation Outline Ideal vs. Reality Experimental Challenges Overlay Methodology Pitfalls Obtaining Traces Sharing Traces What can be done? August 8th, 2011 CSET-2011 11
Alice has developed a new botnet detector!!! What should the evaluation show? Alice's Detector August 8th, 2011 CSET-2011 12
Ideal vs. Reality Alice deploys her detector live on her Production-ready deployment? local network Alice is provided with list of hosts that are botnet infected Ground truth of botnet infections? Alice deploys her detector on other Deployment on various various networks networks? Corporate, Residential, Corporate, etc. Record trace and replay Alice records traces of each experiment? deployment Improve detector in the lab Readily available to other researchers Traces available to other researchers? August 8th, 2011 CSET-2011 13
Evaluation Realities Performance Realistic Settings Network Heterogeneity Multiple Administrative Domains Modernity Comparability Lack of Ground T ruth & Repeatability Overfitting Privacy August 8th, 2011 CSET-2011 14
Pitfalls Experimental Challenges Overlay Methodology Pitfalls Obtaining Traces Sharing Traces What can be done? August 8th, 2011 CSET-2011 15
Overlay Methodology v v v Internet v v v v Anonymizer v Network Trace August 8th, 2011 CSET-2011 16
Replay and Evaluate Network Trace Detected 2 Bots! v v v v v v v v v v Collected Background Independently Trace is Sensitive August 8th, 2011 CSET-2011 17
Prevalence in the Literature [13] [49] [15] [36] [46] [47] Overlay [41] [23] [6] Methodology [7] [28] [25] [24] [14] Other [20] [14] [45] Methodology [36] [11] [5] * See paper for references. August 8th, 2011 CSET-2011 18
Advantages of Overlay Methodology v v v v v v v v v v Ground Truth August 8th, 2011 CSET-2011 19
Pitfalls Experimental Challenges Overlay Methodology Pitfalls Obtaining Traces Sharing Traces What can be done? August 8th, 2011 CSET-2011 20
Obtaining Traces Realism Merging of Botnet and Background trace should be realistic August 8th, 2011 CSET-2011 21
Collecting Botnet Traces v August 8th, 2011 CSET-2011 22
Realistic Embedding Residential ISP ? SPAM! v v v v v v v v v v August 8th, 2011 CSET-2011 23
Mixing Artifacts v v v v v v v v v v v v DHCP August 8th, 2011 CSET-2011 24
Multimorbidity v v v v v v v v v v v v v v August 8th, 2011 CSET-2011 25
Obtaining Traces Realism Merging of Botnet and Background trace should be realistic Representativeness Reflect diversity in network scenarios August 8th, 2011 CSET-2011 26
Focus on Academic Networks v v v v v v v v v v Corporate Business State University August 8th, 2011 CSET-2011 27
Prevalence in the Literature At Least One Academic Traces Other Trace [13] [49] [15] [28] [25] [24] Overlay [36] [46] [47] [14] Methodology [41] [23] [6] [7] Other [20] [14] [45] [36] [11] [5] Methodology * See paper for references. August 8th, 2011 CSET-2011 28
Scale v v v v v v v v v v v v vv v v vv v v v v v v v v v v v v v v v vv v v v v v v v v v v v v v v vv v v v v v v v v v vv v v v v v v v v v vv v vv v v v vv v v v v v v v v v v v v v v v v v v v v vv v v v vv v v v v v v v v vv v v v vv v v v v v v v v v v v v v v v vv v v v v v v v v v v v v v v v vv v v v v v v v v v vv v v v v v v v v v v v v v vv v v vv v v v v v v v v vv v v vv v v vv v v v v v v v v v v v v v v v v v v v v v vv vv vv v v v v v v v v v v v v vv vv August 8th, 2011 CSET-2011 29
Obtaining Traces Realism Merging of Botnet and Background trace should be realistic Representativeness Reflect diversity in network scenarios Performance False postives and negatives August 8th, 2011 CSET-2011 30
Lack of Verification v v v v v v v v v August 8th, 2011 CSET-2011 31
Example From the Literature T aMD “ ” We suspect that the reason not every bot in the botnet was detected is due to the randomness in our choice of selected internal hosts to which the malware traffic was assigned, such that a selected internal host that was also contacting other suspicious subnets (not relevant to the botnet) is likely to bias the dimension reduction and clustering algorithm. August 8th, 2011 CSET-2011 32
privacy August 8th, 2011 CSET-2011 33
Sharing Traces v v v v v v v v v v Is the experiment independently repeatable? Can we do apples to apples comparison? August 8th, 2011 CSET-2011 34
What can be done? Experimental Challenges Overlay Methodology Pitfalls Obtaining Traces Sharing Traces What can be done? August 8th, 2011 CSET-2011 35
Observations Much of these challenges stem from difficulties in sharing and obtaining realistic data sets. Similar to problems faced by researchers studying large scale distributed systems ---> PlanetLab August 8th, 2011 CSET-2011 36
Can we do better together? A PlanetLab for Botnet Detection? August 8th, 2011 CSET-2011 37
Strawman Distributed Evaluation PlanetLab-like nodes on participating networks Cannot communicate network traces outside of network Researchers Deploy Detector Code on Nodes Reports are reviewed and declassified by sys-admins Researcher can test and debug on local node Incentives Sys-Admins gain access to bleeding edge detectors, for FREE! Researchers gain insight into usefulness of reports or “ground truth” August 8th, 2011 CSET-2011 38
Address Challenges Performance Realistic Settings Network Heterogeneity Lack of Ground Truth Multiple Administrative Domains Modernity Comparability & Repeatability Overfitting Privacy August 8th, 2011 CSET-2011 39
Huge Deployment Challenges Privacy Accountability August 8th, 2011 CSET-2011 40
Conclusions T aking a step back Overlay Methodology Literature Review And, its pitfalls Ideal is hard Can we do better Ideal vs. Reality together? Privacy! PlanetLab for Sharing and Obtaining realistic traces Botnet detectors? August 8th, 2011 CSET-2011 41
Backup August 8th, 2011 CSET-2011 42
Recommend
More recommend