Adversarial Attacks on Phishing Detection Ryan Chang 張櫂閔
Overview • What is Adversarial Attack? • Why should we care? • How does it work? • Real World Demo on Phishing Detection Model • How to defend? • Adversarial Examples on Human Brain
Adversarial Attack
What is Adversarial Attack? [1] Goodfellow, I. J., Shlens, J., & Szegedy, C, Explaining and harnessing adversarial examples. , 2014 [2] Cihang Xie et. al., Mitigating adversarial effects through randomization , 2017
Why Should We Care ? [1] Jan Hendrik Metzen et. al., Universal Adversarial Perturbations Against Semantic Image Segmentation , 2017 [2] Kevin Eykholt, Robust Physical-World Attacks on Deep Learning Visual Classification , 2018 [3] Gu et al., BadNets: Identifying vulnerabilities in the machine learning model supply chain , 2017
Not Only in Computer Vision • Voice to Text [1] Voice to Text Adversarial Example: • Natural Language Processing [2] • Malware Detection: MalGAN [3] • Reinforcement Learning: Game Playing [4] NLP Adversarial Example: [1] Yuan Gong et. al., Protecting Voice Controlled Systems Using SoundSource Identi fi cation Based on Acoustic Cues , 2018 [2] Volodymyr Kuleshov et al., Adversarial Examples for Natural Language Classification Problems , 2018 [3] Ying Tan Weiwei Hu. Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN , 2017 [4] Yen-Chen Lin et. al., Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017
How does it work? - Mathematically • Training: Gradient Descent • Attacking: Fast Gradient Sign Method (FGSM)
How does it work? - Visually [1] Shirin Haji Amin Shirazi, A Survey on Adversarial Machine Learning, 2018
White Box Adversarial Attack • Fast Gradient Sign Method (FGSM) • Basic Iterative Method (BIM) • Momentum Iterative Method (MIM) • Saliency Map Method • CW • Deep Fool Reading List: https://github.com/chawins/Adversarial-Examples-Reading-List Adversarial Tool: https://github.com/tensorflow/cleverhans
Black Box Adversarial Attack • Transferability: Adversarial samples that fools model A has a good chance to fool a previously unseen model B [1][2] [1] Clarence Chio, Machine Duping 101: Pwning Deep Learning Systems , DEF CON 24, 2016 [2] Ian Goodfellow, Explaining and Harnessing Adversarial Examples , 2015 [3] Evan Liu, Ensembling as a Defense Against Adversarial Examples, 2016
Phishing Detection Model
Data Collection • Data Source: • Phishing Data - PhishTank DB • Normal Data - Alexa • Data: • Total: 64476 • Phishing Data: 29000 (45%) • Normal Data: 35476 (55%) • Training Data: 58028 (90%) • Testing Data: 6448 (10%) Alexa: http://www.alexa.com/topsites PhishTank DB: https://www.phishtank.com
Features Features Descriptions #js Count of '<script' #form Count of '<form' #pwd Count of '<input type="password"' #btn Count of '<input type="button"' #txt Count of '<input type="text"' #iframe Count of '<iframe' #div Count of '<div' #img Count of '<img' #style Count of '<style' #meta Count of '<meta' #action Count of '<form action' len(html) Length of HTML codes len(url) Length of url online Is it online? (0/1)
Traditional ML Models • Decision Tree • SVM • kNN • Random Forest
Deep Learning Models • Bidirectional GRU (RNN) • additional features: url texts • Bidirectional GRU + DNN • features: url texts + 14 features https://www.quora.com/When-should-one-use-bidirectional-LSTM-as-opposed-to-normal-LSTM
Results Model Accuracy Recall Precision Decision Tree .8664 .8615 .8399 Random Forest .9365 .9237 .9319 SVM .8435 .6766 .9558 kNN .8360 .8249 .8076 RNN .8839 .8783 .8614 RNN+DNN .8925 .8639 .8892 Ensemble .9463 .9216 .9551
Adversarial Attack - Substitute Model Simple deep dense network Configuration: Batch size: 32 Epochs: 30 Activation: tanh Optimizer: adam(lr=1e-5) Loss function:categorical_crossentropy Accuracy Recall Precision Attacker Sys. .8123 .8091 .7753 Target Sys. .9371 .9251 .9320
Adversarial Attack (Black Box) Substitute Model Accuracy Recall Precision Time to Attack (s) No Attack .8123 .8091 .7753 N/A Fast Gradient Sign Method .4426 .1275 .2461 1 Basic Iterative Method .3142 .4759 .3159 17 Momentum Iterative Method .5079 .5356 .4514 16 Original Model Accuracy Recall Precision Time to Attack (s) No Attack .9371 .9251 .9320 N/A Fast Gradient Sign Method .5694 .2179 .5294 1 Basic Iterative Method .4475 .3634 .3712 17 Momentum Iterative Method .5167 .2513 .4203 16 Configuration: 'eps': 500, 'eps_iter':10, 'nb_iter':1000, 'clip_min':0, 'clip_max':100000
An Adversarial Example #js #form #pwd #btn #txt #iframe #div #img #style #meta #action len(html len(url) onlin ) e origin. 16 1 0 0 20 70 21 66 12 52 30 25661 13 1 adver. 106 31 0 41 51 121 58 102 32 95 40 33858 13 1
An Adversarial Example Adversarial Website Original Website
Adversarial Training
Adversarial Training Before Adversarial Training Accuracy Recall Precision Time to Attack (s) No Attack .9365 .9237 .9319 N/A Fast Gradient Sign Method .5694 .2179 .5294 1 Basic Iterative Method .4475 .3634 .3712 17 Momentum Iterative Method .5167 .2513 .4203 16 After Adversarial Training Accuracy Recall Precision Time to Attack (s) No Attack .9367 .9244 .9316 N/A Fast Gradient Sign Method .9153 .9012 .9063 1 Basic Iterative Method .9021 .8797 .8964 17 Momentum Iterative Method .8484 .7613 .8790 16
Is Machine Learning Not Reliable? • AI/ML beats human in terms of accuracy in many tasks • But it makes mistakes that human will not make in adv. attack • Are there adversarial examples for human brain?
Adversarial Examples on Human Brains [1] William Prinzmetal et. al., The Ponzo illusion and the perception of orientation, 2001 [2] Pinna Illusion, Ian Goodfellow, 2016; Pinna and Gregory, 2002
Conclusion • Adversarial attack is easy to do • Adversarial defenses are hard but necessary • Both accuracy and robustness should be considered in evaluation
https://docs.google.com/presentation/d/1Kij2xc_3tAIYPlMOyiYFw9zm16ggXaY94FzHZd2C4qU/edit?usp=sharing
Recommend
More recommend