A Collaborative Approach to Anti-Spam Chia-Mei Chen National Sun Yat-Sen University TWCERT/CC Taiwan TWCERT/CC, Taiwan Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center Agenda � Introduction � Introduction � Proposed Approach � System Demonstration � Experiments � Conclusion Conclusion Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center 1
Problems of Spam Mail � Commercial Spam � Commercial Spam � Reduce productivity � waste network bandwidth and increase processing load of mail servers � Spam mail may include pornography messages Taiwan Computer Emergency Response Team / Coordination Center Taiwan Computer Emergency Response Team / Coordination Center Problems of Spam Mail (2) � Malicious Spam � Malicious Spam � Virus Spam � Worm Spam � Rootkit Spam � Backdoor Spam � Botnet � Phishing Taiwan Computer Emergency Response Team / Coordination Center Taiwan Computer Emergency Response Team / Coordination Center 2
Spam Filter � Most Spam filter is standalone � Most Spam filter is standalone � Filtering out spam mails based on mail header and keywords � The most important problem of standalone spam filter is � the content of unsolicited messages evolve and may change time by time h i b i � a standalone mail filter might not be able to fast enough to catch up all new types of spam mails Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center Collaborative Anti-spam Framework Taiwan Computer Emergency Response Team / Coordination Center Taiwan Computer Emergency Response Team / Coordination Center 3
Proposed System � Spam rule generation � Spam rule generation � Spam rule exchange � Spam rule evolution Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center System Architecture Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center 4
Spam Rule Generation � Using machine learning or statistic � Using machine learning or statistic approach to generate exchangeable spam rules � Decision tree � Rough set � Bayesian ayes a � Using header information, keyword frequency and format information as feature Taiwan Computer Emergency Response Team / Coordination Center Taiwan Computer Emergency Response Team / Coordination Center Selected Attributes Attributes Description From From The sender's name and email address The sender s name and email address. Reply to If this mail specifies an address for replies to go to CC If this mail has carbon copy Received It means where the message originated and what route it took to get to you. Subject The subject of this mail. Body The content of this mail. Length The length (byte) of this mail The domain name of sender ’ s mail server d ’ Domain i h d i f il Multi part Does this mail be multi part? Text/Html The format of the content of mail. Hasform Does this mail have form? Table Does the content of mail have tables Rec_number The number of keyword in the mail Taiwan Computer Emergency Response Team / Coordination Center Taiwan Computer Emergency Response Team / Coordination Center Encoding The encoding of this mail 5
Rule Exchange Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center Spam Rule Evolution � R :the reward � R ii :the reward + ⎧ ⎫ S R if rule i is used = ⎨ i ii ⎬ S � S i :the strength of β ⋅ < β < i ⎩ ⎭ S , 0 1 if rule i is not us ed i rule i ⎧ ⎫ R if classi fy correct ly S i can be viewed � = ⎨ ii ⎬ R − ii ⎩ ⎭ R if classi fy incorre ctly as rule quality ii Taiwan Computer Emergency Response Team / Coordination Center Taiwan Computer Emergency Response Team / Coordination Center 6
System Demonstration � User Interface (mail client) � User Interface (mail client) � Open web mail � Rule Generate � Rosetta � Mail Pre-Process and Filter � Procmail � Rule Exchange � Rule Exchange � XML Files � Mail and Rule Repository � MySQL Database Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center User Interface (Inbox) Feedback Legitimate mail Legitimate mail Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center 7
User Interface (Spam folder ) Feedback Spam mail Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center Rule Generation (Rosetta) Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center 8
Mail Repository Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center Performance Evaluation � Performance Metrics � Performance Metrics � Training and testing data source � Experiment results Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center 9
Performance Metrics � Spam precision � Spam precision � spam recall � accuracy � Miss rate Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center Data Source MIS MIS NSYSU NSYSU TWCERT/CC Department University Spam mails 3,483 3,115 17,948 Legitimate 809 531 991 mails Totals 4,294 3,646 18,939 Data are gathered fro m 2006/ 5/ 10 to 2006/ 5/ 30 Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center 10
Experiment Result- Spam Precision 100.0% 99.5% 99.0% 98.5% 98.0% 97.5% 10 ‐ May 10 M 20 ‐ May 20 M 30 M 30 ‐ May Rule A ∪ Rule B Rule A ∪ Rule B ∪ Rule C Rule A 10-May 20-May 30-May Rule A 99.8947368% 98.7804878% 99.2448759% Rule A ∪Rule B 98.7538491% 98.7912088% 99.3690852% Rule A ∪Rule B ∪Rule C 98.7551867% 98.7978142% 99.4780793% Taiwan Computer Emergency Response Team / Coordination Center Taiwan Computer Emergency Response Team / Coordination Center Experiment Result- Spam Recall 100.0% 99.5% 99.0% 98.5% 98.0% 97.5% 97.0% 96.5% 96.0% 10 ‐ May 20 ‐ May 30 ‐ May Rule A ∪ Rule B Rule A ∪ Rule B ∪ Rule C Rule A 10-May 20-May 30-May Rule A 99.1640535% 96.8478261% 96.1423221% Rule A ∪Rule B 99.3730408% 97.7173913% 98.4831461% Rule A ∪Rule B ∪Rule C 99.4775340% 98.2608696% 99.2322097% Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center 11
Experiment Result- Miss Rate 16.00% 14.00% 12.00% 10.00% 8.00% 6.00% 4.00% 2.00% 0.00% 10 May 10 ‐ May 20 May 20 ‐ May 30 May 30 ‐ May Rule A Rule A ∪ Rule B Rule A ∪ Rule B ∪ Rule C 10-May 20-May 30-May Rule A 0.7656757% 12.7906977% 9.3333333% Rule A ∪Rule B 8.1081081% 12.7906977% 8.0000000% Rule A ∪Rule B ∪Rule C 8.1081081% 12.7906977% 6.6666667% Taiwan Computer Emergency Response Team / Coordination Center Taiwan Computer Emergency Response Team / Coordination Center Experiment Result- Accuracy 99.5% 99.0% 98.5% 98.0% 97.5% 97.0% 96.5% 96.0% 95.5% 95.0% 10 ‐ May y 20 ‐ May y 30 ‐ May y Rule A Rule A ∪ Rule B Rule A ∪ Rule B ∪ Rule C 10-May 20-May 30-May Rule A 99.1855204% 96.0238569% 96.4391951% Rule A ∪Rule B 98.3710407% 96.8190855% 98.8713911% Rule A ∪Rule B ∪Rule C 98.4615385% 97.3161034% 99.5013123% Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center 12
Conclusion � Due to rule exchange and evolution, collaborative � Due to rule exchange and evolution, collaborative approach is better than standalone server � Collaborative approach can extend to hierarchical architecture � Some powerful server generate and exchange spam rules and spam rules can be transmitted to other powerless server � In future study spam rules can be generated by � In future study, spam rules can be generated by different rule-based approach and an integrated scheme will be developed Taiwan Taiwan Computer Emergency Response Team / Coordination Center Computer Emergency Response Team / Coordination Center Q&A Thank You!! Taiwan Computer Emergency Response Team / Coordination Center Taiwan Computer Emergency Response Team / Coordination Center 13
Recommend
More recommend