optimization of pattern matching algorithm for memory
play

Optimization of Pattern Matching Algorithm for Memory Based - PowerPoint PPT Presentation

Optimization of Pattern Matching Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing Hua University, Taiwan, R.O.C Outline Memory architecture for string matching Basic idea Novel


  1. Optimization of Pattern Matching Algorithm for Memory Based Architecture Cheng-Hung Lin, Yu-Tang Tai, and Shih-Chieh Chang National Tsing Hua University, Taiwan, R.O.C

  2. Outline � Memory architecture for string matching � Basic idea � Novel Algorithm for memory architecture � Experimental results and conclusions

  3. Introduction � Network Intrusion Detection System is used to detect network attacks by identifying attack patterns. � Software-only approaches can no longer meet the high throughput of today’s networking � Hardware approaches for acceleration. – Logic architecture – Memory architecture

  4. Advantage of Memory Architecture � The memory architecture has attracted a lot of attention because of its easy re- configurability and scalability. � Young H. Cho and William H. Mangione-Smith, “A Pattern Matching Co- processor for Network Security,” in Proc. 42nd IEEE/ACM Design Automation Conference, Anaheim, CA, June 13-17, 2005. � M. Aldwairi*, T. Conte, and P. Franzon. “Configurable String Matching Hardware for Speeding up Intrusion Detection,” in Proc . ACM SIGARCH Computer Architecture News , 33(1):99–107, 2005. � S. Dharmapurikar and J. Lockwood. “Fast and Scalable Pattern Matching for Content Filtering,” in Proc. Symposium on Architectures for Networking and. Communications Systems (ANCS) , Oct 2005.

  5. Memory Architecture ~b & b b b b ~p Attack b c d “bcdf” f 0 1 2 3 4 Patterns “pcdg” p p p b p p b b b g c d 5 6 8 Memory 7 f f p NS1 NS2 …… NS256 MV f <8> <8> …… <8> <16> Decoder Current state FSM match 256:1 MUX vector 8 Input

  6. Major Issue of Memory Architecture � Due to the increasing number of attacks, the required memory increases tremendously – The performance, cost, and power consumption are related to the memory size – Reducing the memory size has become imperative

  7. Outline � Memory architecture for string matching � Basic idea � Novel algorithm for memory architecture � Experimental results and Conclusions

  8. Review of Aho-Corasick Algorithm � Aho-Corasick (AC) algorithm can reduce large number of state transitions and memory size. – Solid line represents valid transitions. c d f – Dotted line represents b 0 1 2 3 4 failure transitions. – Introduce the failure transition to reduce the outgoing p g transitions. c d 5 6 7 8 AC state machine of “bcdf”and “pcdg”

  9. Observation � Many string patterns are similar because of common sub-strings � The similarity does not lead to a small state machine. b c f d 0 1 2 3 4 “bcdf” “pcdg” p c d g 5 6 7 8 AC state machine

  10. Merge Similar States c d c d f f b b merg_FSM 0 0 1 1 26 2 37 3 4 4 g p c p 5 8 c d g 5 6 7 8 � The merg_FSM is a different machine – smaller number of states and transitions. – smaller memory in memory architecture.

  11. Problem of merg_FSM � Directly merging similar states results in an erroneous state machine. input stream = {p, c, d, f} False Positive f b c d f b c d 1 2 3 4 0 1 26 37 4 0 g p c p 5 8 c d g 5 6 7 8 merg_FSM AC state machine

  12. Outline � Memory architecture for string matching � Basic Idea � Novel Algorithm for memory architecture � Experimental results and Conclusions

  13. State Traversal Mechanism � Store merg_FSM table in memory � State traversal mechanism is used to memorize the precedent state and differentiate merged states. merg_FSM c d f b b c f d 0 1 2 3 4 0 1 4 26 37 g p c p 5 8 ?2 or ?6 c d g 5 6 7 8 State traversal mechanism AC state machine

  14. New State Information � AC state machine stores match vector . � New state machine stores – PathVec stores path information. – IfFinal indicates whether the state is a final state. match vector pathVec_ifFinal 00 00 00 01 00 01_0 11_0 01_0 01_0 01_1 0 1 2 3 4 0 1 2 3 4 b c d f b c f d p 00 10 00 00 10_0 10_0 10_0 10_1 p 5 6 7 8 5 6 7 8 c d g g c d AC State Machine New State Machine

  15. Pseudo-Equivalent States � Definition: Two states are pseudo-equivalent if they have – identical input transitions – identical failure transitions – identical ifFinal – but different next states. b c d f 1 2 3 4 0 11_0 01_0 01_0 01_0 01_1 p d c g 5 7 8 6 10_0 10_0 10_0 10_1

  16. Merge Pseudo-Equivalent States Pseudo-equivalent states are merged. b c d f 1 2 3 4 0 11_0 01_0 01_0 01_0 01_1 p g c d 5 6 7 8 10_0 10_0 10_0 10_1 11_0 01_0 11_0 11_0 11_0 11_0 01_1 1 26 4 0 37 d f PathVec and ifFinal b c g are updated by a union p c 5 8 of merged states 10_0 10_1

  17. State Traversal Mechanism � PreReg traces the precedent pathVec in each state. Next state pathVec ifFinal 11_0 01_0 11_0 11_0 01_1 b c d f 1 26 4 0 37 g 10_0 10 11 p c 10_1 11 10 01 5 8 11 10 00 input stream: {p, c, d, f} preReg

  18. Outline � Memory architecture for string matching � Basic Idea � Novel algorithm for memory architecture � Experimental results and Conclusions

  19. Experiment I � Perform experiments on Snort rule sets. � Compare our approach with the Aho- Corasick algorithm . � A.V. Aho and M.J. Corasick. Efficient String Matching: An Aid to Bibliographic Search. In Communications of the ACM 1975 .

  20. Compare with Traditional AC Tradition AC [24] Our algorithm # of Rule Sets # of char. patterns Memory Memory Memory # of # of # of # of trans. states (bytes) trans. states (bytes) Reduct. Oracle 138 4,674 2,180 2,185 880,009 1,389 1,221 452,533 49% Sql 44 1,089 421 422 129,290 321 284 87,011 33% Backdoor 57 599 563 565 191,253 523 497 152,268 20% Web-iis 113 2,047 1,533 1,537 569,651 1,273 1,155 428,072 25% Web-php 115 2,455 1,670 1,675 620,797 1,295 1,142 423,254 32% Web-misc 310 4,711 3,576 3,587 1,444,664 3,031 2,734 1,101,119 24% Web-cgi 347 5,339 3,407 3,419 1,377,002 2,672 2,358 949,685 31% Total rules 1,595 20,921 17,472 17,522 8,745,668 14,704 13,381 6,248,927 29% Ratio 1 1 1 84% 76% 71% 29%

  21. Experiment II � Enhance the bit-split algorithm with our method – The results are compared with the original bit-split algorithm. � L. Tan and T. Sherwood. A high throughput string matching architecture for intrusion detection and prevention. In ISCA’05 .

  22. Compare with Traditional Bit-Split Bit-split [8] Bit-split + Our algorithm # of Rule Sets # of char. patterns # of # of Memory # of # of Memory Memory trans. states (bytes) trans. states (bytes) Reduct. Oracle 138 4,674 6,645 6,665 633,175 4,146 3,603 358,499 43% Sql 44 1,089 1,211 1,215 110,565 866 769 72,671 34% Backdoor 57 599 1,697 1,705 155,155 1,441 1,305 126,585 18% Web-iis 113 2,047 4,869 4,885 464,075 3,844 3,374 335,713 28% Web-php 115 2,455 4,991 5,011 476,045 3,871 3,345 332,828 30% Web-misc 310 4,711 10,959 11,003 1,067,291 8,861 7,816 797,232 25% Web-cgi 347 5,339 9,901 9,949 965,053 7,875 6,957 709,614 26% Total ruls 1,595 20,921 53,930 54,130 5,467,130 43,550 38,701 4,237,760 22% 22% Ratio 1 1 1 81% 71% 78%

  23. Conclusion � Provide a concept of merging pseudo- equivalent states to reduce the number of states and transitions. � Propose a state traversal mechanism working with the merg_FSM without false positive matching results. � Experimental results demonstrate a significant reduction in memory requirement.

  24. Thank You!

  25. Backup

  26. Cycle Problem � Merging disorder sections of pseudo- equivalent states creates cycle problem. e a c d f b 5 6 3 4 0 1 2 w g d e b c 10 11 12 9 8 7

  27. Cycle Problem � For example, the input string “abcdebcdef” will be mistaken as a match of the pattern “abcdef.” b e a c d f b 3 4 5 6 0 1 2 w d g 12 7

  28. Construction of State Traversal Machine � Construction of the state traversal machine consists of two steps – Step1: Construct valid transitions, failure transitions, pathVec, and ifFinal function. – Step2: Merge the pseudo-equivalent states.

  29. Example � Consider three patterns “abcdef”, “apcdeg”, “awcdeh”. 16 states 001_0 111_0 011_0 001_0 001_0 111_0 011_0 001_0 001_0 001_0 001_1 e a c d f b 5 6 3 4 0 1 2 p 010_0 010_0 010_1 010_0 010_0 g c d e 11 10 9 7 8 100_1 100_0 100_0 100_0 100_0 h c d e w 16 15 14 12 13

  30. Merging Pseudo-equivalent States � merging the failure transitions � performing the union on the pathVec of the merged states 111_0 001_0 001_0 111_0 001_0 001_0 001_1 111_0 e a c d f b 5 6 3 4 0 1 2 p 010_1 010_0 010_0 010_0 010_0 g d e c 11 10 9 7 8 100_1 100_0 100_0 100_0 100_0 d h c w e 16 15 14 12 13

  31. Merging Pseudo-equivalent States 111_0 001_0 111_0 001_0 001_0 001_1 111_0 e a c d f b 5 6 3 4 0 1 2 p 010_1 010_0 010_0 010_0 d g c e 11 10 9 7 100_1 100_0 c d 100_0 100_0 h w e 16 15 14 12

Recommend


More recommend