M EERKAT seclab Detecting Website Defacements through Image-based Object Recognition THE COMPUTER SECURITY GROUP AT UC SANTA BARBARA Kevin Borgolte kevinbo@cs.ucsb.edu Christopher Kruegel chris@cs.ucsb.edu Giovanni Vigna vigna@cs.ucsb.edu August 13th, 2015 USENIX Security 2015
seclab Defacements Source: The Register, August 3rd 2015, http://www.theregister.co.uk/2015/08/03/trump_website_hacked/ Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2
seclab Defacements: Scale • Prolific defacers: Team System Dz, 2,800 websites in 10 months (~10/day) • Over 4,700 manually-verified defacements each day (Zone-H) • Defacements to Phishing Pages Reported Websites per Month Average: ~7 to 1 1,000,000 Reported Websites Maximum: ~33 to 1 100,000 • Over 53,000 websites from 10,000 top 1 million lists were 1,000 Defacements defaced in 2014 Phishing Pages 100 2000-01 2001-01 2002-01 2003-01 2004-01 2005-01 2006-01 2007-01 2008-01 2009-01 2010-01 2011-01 2012-01 2013-01 2014-01 Month Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 3
seclab Approach • Prior work looks at websites’ code, host-based IDS etc. • Compares to prior version / known good state • M EERKAT : Visually, like a human analyst • Render website in browser • Take screenshot • Does the screenshot looks like a defacement? • No previous version of website needed • No manual feature engineering Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 4
seclab Approach: Deep Neural Network ... Defaced Legitimate ... ... ... 160x160x3 ... 18x18x3 Local Feed-forward with Local 1600x900x3 L2 ... Receptive Dropout Contrast Pooling Fields Normalization Window Screenshot Feature Learning Classification Collection Extraction Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 5
seclab Approach: A Window “Into” The Defacement ... Defaced Legitimate ... ... ... 160x160x3 160x160x3 ... 18x18x3 Local Feed-forward with Local 1600x900x3 1600x900x3 L2 ... Receptive Dropout Contrast Pooling Fields Normalization • Full-size screenshots impractical; window “into” defacement instead • Size of window? • Too large ⇒ overfit (high variance) • Too small ⇒ underfit (high bias) • Extract window from which part of the screenshot? Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 6
seclab Approach: Representative Window Extraction (1) Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 7
seclab Approach: Representative Window Extraction (2) • Sample windows from 2-dimensional Gaussian distribution • Bias heavily toward center of page • If outside of screenshot, resample • Why? • Center of page is descriptive! • Non-trivial to poison training set Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 8
seclab Approach: Deep Neural Network • Feature Learning: Stacked Auto-Encoders • Classification: Feed-Forward Neural Network with Dropout • Implemented on-top of Caffe • Trained on GPU, training time in weeks Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 9
seclab Approach: Features Learned • Color combinations • Green on black? Black on white/bright gray? • Letter combinations • Broken and mixed encodings • Leetspeak • “pwned” or “h4x0red” • Typographical and grammatical errors • “greats to” or “visit us in our website” • Defacement group logos Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 10
seclab Approach: Detection Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 11
seclab Evaluation • Dataset • 10,053,772 defaced websites = positives • Defaced websites manually verified by Zone-H • 2,554,905 legitimate websites = negatives • Legitimate websites not verified, might be defaced • Dataset skewed toward defacements • Report Bayesian detection rate (BDR): P(true positive|positive) • Unverified legitimate websites ⇒ TPR & BDR are lower-bounds! Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 12
seclab Evaluation: Traditional • 10-fold cross-validation • Results: • TPR: avg. 97.878% [97.422%, 98.375%] • FPR: avg. 1.012% [0.547%, 1.419%] • BDR: avg. 99.716% [99.603%, 99.845%] • Traditional evaluation has problems: • Same defacement possibly in two bins • Defacements from 1998 vs. 2014 Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 13
seclab Limitations • Fingerprinting and delayed defacements • Tiny defacements • Huge advertisements • Concept drift (natural and adversarial) • Major: learn new features from new data (no feature engineering) • Minor: adjust weights of deeper classification layer Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 14
seclab Limitations: Minor Concept Drift & Fine-Tuning • Train on Dec 2012 to Dec 2013 Time-wise Split, with and without Fine-Tuning 1.000 • 1.78 million defacements True Positive Rate with fine-tuning 0.995 without fine-tuning 0.990 • 1.76 million legitimate pages 0.985 0.980 0.975 • Test on Jan to May 2014 0.970 0.040 False Positive Rate 0.035 • 1.54 million samples, 50/50 split 0.030 0.025 0.020 • Fine-tune Jan, Feb, Mar, Apr 0.015 0.010 • BDR in Jan: 98.583% 0.015 w/ FT - w/o FT True Positive Rate 0.010 False Positive Rate Di ff erence • w/o FT drops to 97.177% 0.005 0.000 -0.005 • w/ FT increases to 98.717% -0.010 -0.015 • Team System Dz started Jan 2014! January May February March April Month of 2014 Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 15
seclab Conclusion • Introduced M EERKAT • Learns features automatically, match domain knowledge • Does not require prior version of website • Outperforms state of the art • Gracefully tackles minor and major concept drift Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 16
Thank you for your attention! Questions? seclab kevinbo@cs.ucsb.edu http://kevin.borgolte.me twitter: @caovc THE COMPUTER SECURITY GROUP AT UC SANTA BARBARA
Recommend
More recommend