2 related work and background
play

2 Related Work and Background 2.2 Aho-Corasick Algorithm 2.1 - PDF document

Multi-Core Architecture on FPGA for Large Dictionary String Matching Qingbo Wang, Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA 90089-2562 qingbow, prasanna@usc.edu


  1. Multi-Core Architecture on FPGA for Large Dictionary String Matching ∗ Qingbo Wang, Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA 90089-2562 qingbow, prasanna@usc.edu Abstract network traffic, high performance algorithms are required to prevent an IDS from becoming a network bottleneck. FPGAs have been attractive for high performance imple- FPGA has long been considered an attractive platform mentations of string matching due to their high I/O band- for high performance implementations of string matching. width and computational parallelism. Application specific However, as the size of pattern dictionaries continues to optimizations for string matching algorithms have been pro- grow, such large dictionaries can be stored in external posed for FPGA-based designs [18]. They typically use a DRAM only. The increased memory latency and limited small dictionary, on the order of a few thousand patterns bandwidth pose new challenges to FPGA-based designs, (e.g., see [3, 4]). Thus the state transition table ( STT ) gen- and the lack of spatial and temporal locality in data access erated from a Deterministic Finite Automaton (DFA) repre- also leads to low utilization of memory bandwidth. In this sentation of the pattern dictionary, or the pattern signatures paper, we propose a multi-core architecture on FPGA to ad- themselves, can be stored in the on-chip memory or in the dress these challenges. We adopt the popular Aho-Corasick logic of FPGAs. (AC-opt) algorithm for our string matching engine. Utiliz- However, the size of dictionaries has increased greatly. ing the data access feature in this algorithm, we design a A dictionary can have 10,000 patterns or more [14,15] now, specialized BRAM buffer for the cores to exploit a data re- resulting in an STT table tens of megabytes in size. Such use existing in such applications. Several design optimiza- large tables can be stored only in external memory and in- tion techniques are utilized to realize a simple design with cur long access latency. Since every character searched re- high clock rate for the string matching engine. An imple- quires a memory reference, this latency increase degrades mentation of a 2-core system with one shared BRAM buffer the string matching performance. The problem is worsened on a Virtex-5 LX155 achieves up to 3.2 Gbps throughput on a 64 MB state transition table stored in DRAM. Perfor- by the fact that string matching presents little memory ac- mance of systems with more cores is also evaluated for this cess locality and that access to the STT is irregular. architecture, and a throughput of over 5.5 Gbps can be ob- In this paper, we propose a multi-core architecture on tained for some application scenarios. FPGA for large dictionary string matching. We use the Aho-Corasick algorithm (AC-opt) for design verification, but the architecture can be applied to any such algorithms 1 Introduction that employ a DFA stored in DRAM for pattern match- ing [16]. Our study shows, using AC-opt algorithm, that a String matching looks for all occurrences of a pattern small number of frequently visited states exist in the process dictionary, in a steam of input data. It is the key operation of string matching, and the majority of memory references in search engines, and is a core function of network mon- during string matching go to these “hot” states. When we itoring, intrusion detection systems (IDS), virus scanners, allocate these states on FPGA to enable on-chip access to and spam/content filters [3, 4, 15]. For example, the open- them, not only can the traffic to external memory be signif- source IDS Snort [15] has thousands of content-based rules, icantly reduced, but the throughput for the string matching many of which require string matching against entire net- engine is also improved due to fast on-chip access. Our ma- work packets, i.e. deep packet inspection. To support heavy jor contributions are: • To the best of our knowledge, our architecture is the ∗ Supported by the United States National Science Foundation under first multi-core architecture on FPGA for large dic- grant No. CCR-0702784. Equipment grant from Xilinx Inc. is gratefully tionary string matching to address the challenge of acknowledged.

Recommend


More recommend