Curing Regular Expressions Matching Algorithm s from I nsom nia, Am nesia, and Acalculia Sailesh Kum ar Sailesh Kum ar Balakrishnan Chandrasekaran Balakrishnan Chandrasekaran Jonathan Turner Jonathan Turner George Varghese George Varghese
Regular Expressions in Security � Signature based NIDS is a popular device to enable network security. Scan traffic Alarm 4 NIDS 2 1 Attack Reset Drop Packet 3 Connection � Attack patterns are specified as regular expressions. » [ \t]*[Cc][Ww][Dd][ \t]+[~]root –Represents an attempt to change working directory to root. � Regular expression matching is expensive. » Thousands of signatures. » High speed implementation requires GB memory. (often impractical.) 2 - Sailesh Kumar - 1/ 9/ 2008
Traditional Implementation � NIDS implementation. Signatures: r 1 = .*[ gh ]d[^ g ]*g e r 2 = .* fag [^ i ]* i [^ j ]* j r 3 = .* a [ gh ] i [^ l ]*[ ae ] c Traditional implementation attempts to match traffic with the entire Virus signature NIDS Complex signatures lead to trade-off NFA 1 / Memory D 2 FA, etc DFA: Fast, but requires large memory NFA: Compact, but slow DFA D 2 FA: Trades-off memory-performance Performance 3 - Sailesh Kumar - 1/ 9/ 2008
Insomnia Rare match Frequent match � NIDS implementation. .*[ gh ]d[^ g ]*g e .* fag [^ i ]* i [^ j ]* j .* a [ gh ] i [^ l ]*[ ae ] c OBSERVATION: NIDS Typical traffic rarely match first few symbols within any virus signature. NIDS keeps the entire signature active. (Unvisited tail portions can be kept to sleep) We refer to this problem as Insomnia 4 - Sailesh Kumar - 1/ 9/ 2008
Cure to Insomnia � Solve Insomnia with a three-way trade-off. NFA 1 / Memory D 2 FA, etc DFA Performance Performance Smaller matching signature prefixes => high performance low memory In practice, frequently Traffic characteristics Memory matching prefixes are very small in length 5 - Sailesh Kumar - 1/ 9/ 2008
Cure to Insomnia Rare match Frequent match � Insomnia cure. d[^ g ]*g e .*[ gh ] Only prefixes of .* f ag [^ i ]* i [^ j ]* j � If we select prefix s.t. signatures are matched in .* a [ gh ] i [^ l ]*[ ae ] c fast path » Prefixes are small » Few packets match Slow Fast them – goto slow path path path Packets that don’t match prefix will not go to slow path Fast prefix implementation (e.g. DFA) will Packets that match the prefix require less memory, and will be feasible. will go to the slow path Suffixes won’t require fast implementation will use less memory, and will be feasible. High performance, Less memory Suffixes of the prefix How to select matching signatures are the prefixes? matched in slow path 6 - Sailesh Kumar - 1/ 9/ 2008
Prefix Generation CUT ^g * 1 d 2 g 3 e 5 0 g-h 1 0 0 0 0.01 0 0 0.2 0.1 0 2 3 1.0 1 ^i ^j * 0 f 6 a 7 g 8 i 9 j 10 0 0 0 0 s g a d j ... 0 0.01 0 0.001 0 0.1 2 1.0 0 4 1 3 ^l * 0 a 11 g-h 12 i 13 a-e 14 c 15 0 0 0 0 0 0 0 0.1 1 0.02 0.002 3 2 1 0 1.0 MAKE A CUT (Limit the total Construct Run NFA for Count # times Find probability slow path state the NFA an input trace state is active of state activity probability) 7 - Sailesh Kumar - 1/ 9/ 2008
DoS Attacks Frequent match Attacker sends traffic that matches prefix “too often” .*[ gh ] Overloads the slow path .* f .* a well behaving flows Fast will suffer path k per-flow C anomaly counter Slow path Use per-flow anomaly counter Counts # of packets sent to the slow path. d[^ g ]*g e ag [^ i ]* i [^ j ]* j Flows with high anomaly counter [ gh ] i [^ l ]*[ ae ] c value are attack flows Rare match Send then to a low priority queue 8 - Sailesh Kumar - 1/ 9/ 2008
Simulation of DoS Mitigation 50 well 10 become 20 become behaving flows anomalous anomalous 5 No overloading moderate overload Moderate overloading extreme overload Extreme overloading Slow path load no overload 4 slow 3 path 2 slow path's ε threshold 1 load 0 1 26 51 76 101 126 151 176 201 226 251 25 Throughput, no DoS protection 20 thruput good 15 with flows 10 no DOS 5 mitigation 0 1 26 51 76 101 126 151 176 201 226 251 25 Flow throughput. DoS protection 20 thruput 15 with 10 DOS 5 mitigation 0 1 26 51 76 101 126 151 176 201 226 251 time (seconds) time (seconds) 9 - Sailesh Kumar - 1/ 9/ 2008
Results of Splitting Prefix/ Suffix Source # of Regular expressions before split Prefixes after split Rules ASCII Number Total ASCII Number Total length of DFA memory length of DFA memory Cisco 68 44.1 6 973 MB 19.8 1 152 MB Linux 70 67.2 4 30.7 MB 21.4 2 15.8 MB Bro 648 23.64 1 3.77 MB 16.1 1 1.23 MB Snort rule 1 22 59.4 5 114.6 MB 36.9 3 32.1 MB Snort rule 2 10 43.72 2 64.2 MB 16 1 6.5 MB Snort rule 3 19 30.72 N/A N/A 13.8 2 2.42 MB Slow path probability set to less than 0.01% 10 - Sailesh Kumar - 1/ 9/ 2008
Second Contribution - HFA � NFAs are compact but slow ( ab.* c ) | ( ac.* b ) | ( ba.* a ) » Multiple active state b,c a a,b c a c b 0 1 2 3 c a,b � DFAs are fast representation 1 of 3 DFAs – total 12 states » State explosion is serious problem » State explosion mainly occurs a , b , c due to the presence of closures NFA b c 4 1 2 a , b , c a � Three patterns c b 3 5 » 3 separate DFAs create 12 states 0 b a , b , c – 3 active states a a » NFA has only 9 states 6 7 8 – Up to 6 active state » A single DFA creates 20 states – 1 active state 11 - Sailesh Kumar - 1/ 9/ 2008
State Explosion in DFA � State explosion occurs primarily because » DFA has single active state » Don’t remember anything but the current active state (amnesia) � Requires a separate DFA state for every situation that may occur during NFA parse Active states ( ab.* z ) | ( cd.* z ) | ( ef.* z ) 0, 2, 5 Input: abcd * 0, 2, 8 Input: abef NFA b z 3 1 2 * a 0, 5, 8 Input: cdef d c 4 z 5 6 0 e * 0, 2, 5, 8 Input: abcdef f z 7 8 9 k closures => Number of DFA states is exponential in k 12 - Sailesh Kumar - 1/ 9/ 2008
HFA � Our solution is History based Finite Automata (HFA) » Enable a single state of execution » Use a bit to represent the condition that a closure is reached » Certain transitions depends upon the bit values » Bits are also updated as HFA makes its transitions Set if state 5 Set if state 8 Set if state 2 is reached is reached ( ab.* z ) | ( cd.* z ) | ( ef.* z ) is reached b 1 b 2 b 3 * NFA b z 3 1 2 * a d c 4 z 5 6 0 HFA e * f z 7 8 9 13 - Sailesh Kumar - 1/ 9/ 2008
Benefits of HFA � Single State of Execution – high performance � Few bits are required (16, 32) – stored in registers � Avoids state explosion – memory efficient Set if state 5 Set if state 8 Set if state 2 is reached is reached ( ab.* z ) | ( cd.* z ) | ( ef.* z ) is reached b 1 b 2 b 3 * NFA b z 3 1 2 * a d c 4 z 5 6 0 HFA e * f z 7 8 9 14 - Sailesh Kumar - 1/ 9/ 2008
Results Source # of DFA H-FA % space H-FA closures reduction parsing with H-FA rate speedup # of total # of # of # of Total # automata states automata flags of states Cisco64 14 1 132784 1 6 3597 94.69 - Cisco64 14 1 132784 1 13 1861 96.77 - Cisco68 19 1 328664 1 17 2956 97.03 - Snort 1 6 3 62589 1 5 583 97.40 3x 98.58 Snort 2 1 1 12703 1 1 71 - Snort 3 5 2 4737 1 5 116 93.48 2x Linux70 11 2 20662 1 9 1304 81.63 2x 15 - Sailesh Kumar - 1/ 9/ 2008
� Thank you and Questions??? 16 - Sailesh Kumar - 1/ 9/ 2008
Recommend
More recommend