Learning Rules for Anomaly Detection (LERAD) of Hostile Network Traffic Matt Mahoney
Overview � Prior Work in Network Anomaly Detection � The 1999 DARPA Intrusion Detection Evaluation � Packet Header Anomaly Detection (PHAD) � Application Layer Anomaly Detection (ALAD) � Learning Rules for Anomaly Detection (LERAD)
An anomaly detector models “normal” behavior. Deviations may be an attack. Host-Based: models sequences of system calls made by a server or operating system program. � N-grams (Forrest, 1996) � State machine, neural networks (RST, Ghosh 1999) Network Based: usually models IP addresses and ports. � User-programmed rules: firewalls, SNORT, Bro � Learned rules: ADAM, NIDES, SPADE
The 1999 DARPA Evaluation Data Set SunOS Solaris Sniffer Internet Cisco Router Linux Ethernet Attacks Windows NT Attacks � Weeks 1 and 3: training: no attacks � Week 2: training: 138 labeled instances of 32 attacks � Weeks 4 and 5: test: 201 unlabeled instances (190 actual) of 53 attacks No data for 12 attacks (week 4, day 2) � One mislabeled attack (apache2) �
1999 DARPA IDS Attacks Abuse of Legitimate Bug Exploit Configuration Error Service Exploit Probe : illegalsniffer, Probe : queso Probe : mscan, ntinfoscan, ipsweep, ls, ntfsdos, satan portsweep DOS : apache2, back, dosnuke, land, pod, R2L : dict, ftpwrite, guest, DOS : arppoison, selfping, syslogd, teardrop snmpget, xsnoop mailbomb, neptune, processtable, resetscan, R2L : framespoofer, imap, smurf, tcpreset, udpstorm, named, ncftp, phf, warezclient, warezmaster sendmail R2L : httptunnel, netbus, U2R : anypw, casesen, netcat, ppmacro, sshtrojan, eject, fdformat, ffbconfig, xlock loadmodule, perl, ps, sechole, sqlattack, xterm, U2R/Data : secret yaga
1999 Evaluation Results - Best 4 of 18 Systems (Lippmann, 2000) � IDS must identify IP address of attacker or target and time within 60 seconds. � Evaluated at 100 false alarms (10 per day) threshold. System Detections Expert 1 85/169 (50%) Expert 2 81/173 (47%) Dmine 41/102 (40%) Forensics 15/27 (55%) � Blind (developers had no access to test data) � Evaluated by DARPA � May use both signature and anomaly detection � May use both host and network based methods � May restrict attacks by category, data type, or target
Packet Header Anomaly Detection (PHAD) � Examines Ethernet, IP, TCP, UDP, ICMP protocols � 34 learned rules (trained on week 3) TOS = 0, 8, 16, or 192 � IP source = 12.2.169.104-12.20.180.101, ... � TCP flags = x2, x4, x10, ... � � Score = tn/r summed over packet � t = time since last anomaly (values never seen in training) � n = number of training packets � r = number of allowed values � Detects 72/189 attacks (54 without TTL)
Application Layer Anomaly Detection (ALAD) � Models incoming server TCP connections � Conditional rules (5 forms, selected ad-hoc) If dest. port = 80 then keyword = GET, Host, Accept... � If TCP flags = S/AF/A then dest. port = 23, 25, 80 � If source IP = x.x.x.x then dest. IP = y.y.y.y, ... � If source IP = x.x.x.x then dest. IP/port = y.y.y.y:p, ... � Dest. IP/port = x.x.x.x:p, x.x.x.x:p, ... � � Score = tn/r � Detects 59/189 attacks (70 with PHAD w/o TTL)
Learning Rules for Anomaly Detection (LERAD) � Like ALAD, but rule forms are derived from a sample of training data. � If A 1 = V 1 and A 2 = V 2 and ... then A m = V 1 , V 2, ... or V r � 23 attributes (A i ). Date, time � Source, destination IP address (4 bytes), and ports � TCP flags (first, next to last, last) � Duration in seconds � Length in bytes � First 8 words of application data � � Score = tn/r � Detects 112-118/190 attacks (average 114.8, or 60%) � No improvement when merged with PHAD or ALAD.
Rule Learning Algorithm 1. Select random sample of 20-100 tuples (TCP connections) of training data. 2. Generate 1000-5000 rules satisfying randomly selected pairs of tuples. 3. Sort rules by decreasing n/r on sample. 4. Coverage test: remove rules that predict no additional values in sample (leaving 80-120 rules). 5. Train on full input (35,455 tuples in week 3). 6. Remove rules that generate anomalies in last 10% of training (leaving 55-85 rules).
Rule Generation 1. Pick random pair of tuples (from sample or full input). 2. Select up to 4 matching attributes in random order. 3. First match is the consequent. 4. Subsequent matches are conditions in the antecedent. A B C D E 1 2 3 4 5 1 2 3 4 6 C = 3 If A = 1 then C = 3 If D = 4 and A = 1 then C = 3 If B = 2 and D = 4 and A = 1 then C = 3
Coverage Test 1. For each rule by decreasing n/r on the sample 2. Mark each unmarked sample value predicted. 3. If no values can be marked, remove the rule. A B C 1 (R1) 2 4 (R3) 1 (R1) 2 5 (R3) 1 (R1) 3 5 R1. A = 1 (n/r = 3/1) R2. If B = 2 then A = 1 (n/r = 2/1) removed R3. If B = 2 then C = 4 or 5 (n/r = 2/2)
LERAD Sample Input Date Time DA1 DA0 DP SA3 SA2 SA1 SA0 SP DUR F1 F2 F3 Len W1 W2 03/15/1999 08:00:57 112 050 25 196 037 037 158 1111 0 .S .AP .AF 857 .^@EHLO .ju 03/15/1999 08:00:57 113 050 25 196 037 037 158 1113 0 .S .AP .AF 880 .^@EHLO .ju 03/15/1999 08:01:13 114 050 80 172 016 016 100 2971 4489 .S .AP .AP 872 .^@GET . 03/15/1999 08:01:13 114 050 80 172 016 016 100 2972 5693 .S .AP .AF 595 .^@GET . 03/15/1999 08:01:13 114 050 80 172 016 016 100 2973 12 .S .AP .AF 318 .^@GET ./w 03/15/1999 08:01:13 114 050 80 172 016 016 100 2974 118 .S .AP .AP 610 .^@GET ./
Sample Rules (sorted by n/r) 1 28882/2 if F2=.AP then F1 = .S .AS 2 14236/1 if DA0=100 then DA1 = 112 3 12854/1 if W3=.HTTP/1.0^M^ then W1 = .^@GET 4 12854/1 if W3=.HTTP/1.0^M^ then DP = 80 5 35455/3 if then DA1 = 113 112 114 6 34602/3 if F3=.AF then F1 = .S .AF .AS 7 10857/1 if SA3=172 then SA2 = 016 8 10857/1 if SA2=016 then SA1 = 016 9 10857/1 if SA2=016 then SA3 = 172 10 10642/1 if F1=.S F2=.AP W1=.^@EHLO then DP = 25 11 9914/1 if W3=.HELO then W7 = .RCPT 12 9914/1 if W5=.MAIL then W3 = .HELO 13 9914/1 if W3=.HELO then W1 = .^@EHLO 14 28882/3 if F2=.AP then F3 = .AP .AF .R 15 35455/4 if then F1 = .S .AF .AS .R 16 34602/4 if F3=.AF then F2 = .S .AP . .AS 17 7656/1 if W7=. then W8 = . 18 7645/1 if W5=. then W6 = . 19 7645/1 if W4=. then W7 = . 20 7596/1 if W3=. then W4 = . 21 7566/1 if DA1=114 W3=.HTTP/1.0^M^ then DA0 = 050 22 29549/4 if F1=.S then F2 = .S .AP . .A 23 35455/5 if then F2 = .S .AP . .AS .A 24 35455/5 if then F3 = .S .AP .AF .AS .R 25 12867/2 if W1=.^@GET then W3 = .HTTP/1.0^M^ .align= 26 12854/2 if W3=.HTTP/1.0^M^ then DA0 = 050 100 27 10105/2 if W7=.RCPT then W5 = .MAIL .RCPT 28 35455/8 if then SA3 = 196 172 197 194 195 135 192 152 29 12838/3 if DP=25 then W1 = .^@EHLO . .^@HELO 30 3992/1 if W3=.HTTP/1.0^M^ W7=.text/htm then W8 = .text/pla 31 7647/2 if W6=. then W5 = . .QUIT^M^ 32 7279/2 if SA0=050 then SA1 = 016 073 33 3521/1 if DA1=112 W3=.HTTP/1.0^M^ W6=.User-Age then W7 = .Mozilla/ 34 6824/2 if W6=.User-Age then W4 = .Connection: .Referer: 35 6823/2 if F2=.AP W6=.User-Age then W8 = .[en] .(X11; 36 18807/6 if DA1=112 then DA0 = 050 100 194 207 149 020 37 2998/1 if SA1=037 then SA0 = 158 38 29549/10 if F1=.S then DP = 113 25 23 80 135 21 79 22 515 139 39 35455/12 if then DA0 = 105 050 204 084 168 148 169 100 194 207 149 020 40 34602/12 if F3=.AF then DP = 113 25 23 80 21 20 79 22 1022 515 1023 139 41 35455/13 if then SA2 = 037 016 182 168 169 115 027 008 227 073 007 218 013 42 35455/13 if then DP = 113 25 23 80 135 21 20 79 22 1022 515 1023 139 43 35455/13 if then SA1 = 037 016 182 168 169 115 027 008 227 073 007 218 013 44 2695/1 if SA1=007 then SA2 = 007 45 2695/1 if SA3=194 SA2=007 then SA0 = 153 46 5223/2 if SA3=194 then SA0 = 021 153 47 7656/3 if W7=. then W3 = . .PASS .6667^M^ 48 6852/3 if W4=.Referer: then W5 = .http://w .http://m .http://h 49 2083/1 if SA1=013 then SA0 = 191 50 1888/1 if SA1=227 F1=.S then SA0 = 189 51 12885/7 if DP=80 then W4 = .HTTP/1.0^M^ .Connection: .Referer: . .Host: 52 53 35455/24 if then SA0 = 105 158 050 204 084 182 233 168 148 169 100 194 108 54 12854/10 if W3=.HTTP/1.0^M^ then W8 = .User-Age .[en] .text/pla .(X11; .I; 55 7109/6 if DA1=112 SA2=016 F3=.AF then DA0 = 050 100 194 207 149 020 56 12867/13 if W1=.^@GET then W6 = .User-Age .[en] .Connecti .Accept: .(X11; 57 10857/12 if SA2=016 then DA0 = 105 050 204 084 168 148 169 100 194 207 149 58 1805/2 if F1=.S W6=." then W2 = .^C .^@^@^@ 59 1798/2 if DP=23 F3=.AF then W4 = .^_ .# 60 5827/9 if DP=20 W5=. then DUR = 0 1 4 6 7 2 3 5 36 61 7656/13 if W8=. then W2 = . ., .anonhmous^M^ .anonymMus^M^ .anonyxous^M^ 62 7647/32 if W6=. then DUR = 0 23 1 12 108 4 30 6 9 21 24 7 14 22 2 3 11 15 27
Recommend
More recommend