On Attacking S tatistical S pam Filters Greg Wittel & S . - PowerPoint PPT Presentation

On Attacking S tatistical S pam Filters Greg Wittel & S . Felix Wu U.C. Davis CEAS 2004 1

Outline • Introduction • Attack Classes • Testing A New Attack • Conclusions & Future 2

Attack Classes • Attempted attack methods: – Tokenization • Works against feature selection by splitting or modifying key message features • e.g. S plitting up words with spaces, HTML tricks – Obfuscation • Use encoding or misdirection to hide contents from filter • e.g. HTML/ URL encoding, letter substitution 3

Attack Classes cont. – Weak Statistical • S kew message statistics by adding in random data • e.g. Add in random words, fake HTML tags, random text excerpts – Strong Statistical • Differentiated from ‘ weak’ attacks by using more intelligence in the attack • Guessing v. educated guessing • e.g. Graham-Cumming Attack 4

Attack Classes cont. – Misc: • S parse Data attack • Hash breaking attacks 5

Testing A New Attack • Tested two types of attacks: – Dictionary word attack (old) – Common word attack (new) • Both attacks add n random words to a base message. • Tested against two filters: – CRM114 - S parse binary poly. + Naïve Bayesian – SpamBayes (S B) - Naïve bayesian 6

Procedure • Training data – 3000 hams from S pamAssassin corpus – 3000 spams from S pamArchive-mod corpus – CRM114 trained on errors – SB using bulk training 7

Procedure cont. • Test data – Started with a base ‘ picospam’ not in training data: From: Kelsey Stone <bouhooh@entitlement.com> To: submit@spamarchive.org Subject: Erase hidden Spies or Trojan Horses from your computer Erase E-Spyware from your computer http://boozofoof.spywiper.biz 8

Procedure cont. • Test data cont. – Base picospam is detectable by filters – Generated 1000 variations with n words added. • Words selected with and without replacement • n = 10, 25, 50, 100, 200, 300, 400 – Recorded classifications, effect on score 9

Results • Using 10,000 variants didn’ t effect results • S election with/ without replacement had no effect • Mixed results 10

CRM114 Results • Both attacks failed; 0 false negatives • S pam score was effected... 11

CRM114 Results cont. 1 0.95 Base score Spam probability 0.9 0.85 0.8 Dictionary Common 0.75 0 10 25 50 100 200 300 400 Words added 12

SpamBayes Results • Baseline Dictionary attack: mild success • Common word attack... 13

S pamBayes Results cont. Dictionary 1 Common Spam Thresh. 0.8 Spam probability 0.6 0.4 0.2 Ham Thresh. 0 0 10 25 50 100 200 300 400 Words added 14

S pamBayes Results cont. • Common word attack reduces attack size by up to 4x • What Happened? Why such poor performance on either attack? • Hypothesis: Basis picospam was not in training data. • Added the basis spam to S B’ s training data… 15

S pamBayes Results Part 2 • Retrained filter offered greater resistance to ‘ weak’ dictionary attack. • S mall performance gain against common word attack. • Gains not big enough to resist attack 16

S pamBayes Results Part 2 cont. Dict ionary Word Attack Before 1 After Spam Thresh. 0.8 Spam probability 0.6 0.4 0.2 Ham Thresh. 0 0 10 25 50 100 200 300 400 Words added 17

S pamBayes Results Part 2 cont. Common Word At tack Before 1 After Spam Thresh. 0.8 Spam probability 0.6 0.4 0.2 Ham Thresh. 0 0 10 25 50 100 200 300 400 Words added 18

Conclusion & Future... • Mixed success of common word attack shows need for further study • Other filters – Bogofilter shows similar vulnerability • Effect of re-training on attack msgs v. – False negative, false positive rate • Testing other basis picospams 19

Future cont. • What makes a filter hard to distract? • Relevance of independence assumption • More advanced attacks – Natural language generation • Traditional software flaws – Exploitable buffer overflows – Remote code execution 20

Colophon • Contact information: – Greg Wittel ( wittel at cs . ucdavis . edu ) – S. Felix Wu ( wu at cs . ucdavis . edu ) • Questions? 21

On Attacking S tatistical S pam Filters Greg Wittel & S . - PowerPoint PPT Presentation

On Attacking S tatistical S pam Filters Greg Wittel & S . Felix Wu U.C. Davis CEAS 2004 1 Outline Introduction Attack Classes Testing A New Attack Conclusions & Future 2 Attack Classes Attempted attack

Overview of Discrete-Time Filters First-order filters Ideal filters Practical filters

Overview of Discrete-Time Filters Discrete-Time Filters Overview First-order filters N M

Kerberos and PAM Russ Allbery May 1, 2007 Russ Allbery (rra@stanford.edu) Stanford University

Finite Impulse Response (FIR) Digital Filters Digital filters are rapidly replacing classic

Practical Analog Filters Overview Types of practical filters Filter specifications

AngularJS Unit Testing AngularJS Filters and Services with Karma & Jasmine Filters

HLT MET Noise Filters in Run2011B Alex Mott Caltech Review of Noise Filters HBHE noise

SAP Security: Attacking SAP users Attacking SAP users with sapsploit eXtended 1.1 Xt d d 1 1

Who Whos Really Attacking Y s Really Attacking Your our #WHOAMI Threat

Attacking the Attacking the User- -Machine Machine User Interface Interface A speach

MOI has two main product lines for its component business: 1. Tunable filters (FFP-TF, FFP-TF2,

Filters (Bloom & Quotient) CSCI 333 Operations Filters approximately represent sets.

H ( s ) x ( t ) y ( t ) 1 1 1 c c c x ( t ) = A cos( t + ) Bandpass

Optimum IIR Filters Introduction and Scope Definitions Discussed FIR filters for both

Uses of IIR filters IIR filters can be unstable. What if the pole is on the unit circle?

Q33.5 Three polarizing filters are stacked with the polarizing axes of Three polarizing filters

FERMION DARK MATTER Accepted to JHEP [arXiv:1106.2162] Cornell University In collaboration with

Introduction to Functional Programming in Python David Jones drj@ravenbrook.com Python and

Prss tr

2 6 5/7/2015 1 :

D o m a i n - S p e c i f i c T e s t i n g T o o l s L e s s o n

Scalability + Performance a choose your own adventure game hosted by James Cox 1 Tuesday, 1

Professionalisation of university lecturers The UTQ and beyond VSNU, May 2018

Research Designs I. Use analysis of text to shed light on attitudes and values of the source

On Attacking S tatistical S pam Filters Greg Wittel & S . - PowerPoint PPT Presentation

On Attacking S tatistical S pam Filters Greg Wittel & S . Felix Wu U.C. Davis CEAS 2004 1 Outline Introduction Attack Classes Testing A New Attack Conclusions & Future 2 Attack Classes Attempted attack

Overview of Discrete-Time Filters First-order filters Ideal filters Practical filters

Overview of Discrete-Time Filters Discrete-Time Filters Overview First-order filters N M

Kerberos and PAM Russ Allbery May 1, 2007 Russ Allbery (rra@stanford.edu) Stanford University

Finite Impulse Response (FIR) Digital Filters Digital filters are rapidly replacing classic

Practical Analog Filters Overview Types of practical filters Filter specifications

AngularJS Unit Testing AngularJS Filters and Services with Karma &amp; Jasmine Filters

HLT MET Noise Filters in Run2011B Alex Mott Caltech Review of Noise Filters HBHE noise

SAP Security: Attacking SAP users Attacking SAP users with sapsploit eXtended 1.1 Xt d d 1 1

Who Whos Really Attacking Y s Really Attacking Your our #WHOAMI Threat

Attacking the Attacking the User- -Machine Machine User Interface Interface A speach

MOI has two main product lines for its component business: 1. Tunable filters (FFP-TF, FFP-TF2,

Filters (Bloom &amp; Quotient) CSCI 333 Operations Filters approximately represent sets.

H ( s ) x ( t ) y ( t ) 1 1 1 c c c x ( t ) = A cos( t + ) Bandpass

Optimum IIR Filters Introduction and Scope Definitions Discussed FIR filters for both

Uses of IIR filters IIR filters can be unstable. What if the pole is on the unit circle?

Q33.5 Three polarizing filters are stacked with the polarizing axes of Three polarizing filters

FERMION DARK MATTER Accepted to JHEP [arXiv:1106.2162] Cornell University In collaboration with

Introduction to Functional Programming in Python David Jones drj@ravenbrook.com Python and

Prss tr

2 6 5/7/2015 1 :

D o m a i n - S p e c i f i c T e s t i n g T o o l s L e s s o n

Scalability + Performance a choose your own adventure game hosted by James Cox 1 Tuesday, 1

Professionalisation of university lecturers The UTQ and beyond VSNU, May 2018

Research Designs I. Use analysis of text to shed light on attitudes and values of the source

AngularJS Unit Testing AngularJS Filters and Services with Karma & Jasmine Filters

Filters (Bloom & Quotient) CSCI 333 Operations Filters approximately represent sets.