IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 17, NO. 3, MARCH 2006 241 Processor Array Architectures for Deep Packet Classification Fayez Gebali, Senior Member , IEEE Computer Society , and A.N.M. Ehtesham Rafiq Abstract —This paper presents a systematic technique for expressing a string search algorithm as a regular iterative expression to explore all possible processor arrays for deep packet classification. The computation domain of the algorithm is obtained and three affine scheduling functions are presented. The technique allows some of the algorithm variables to be pipelined while others are broadcast over system-wide buses. Nine possible processor array structures are obtained and analyzed in terms of speed, area, power, and I/O timing requirements. Time complexities are derived analytically and through extensive numerical simulations. The proposed designs exhibit optimum speed and area complexities. The processor arrays are compared with previously derived processor arrays for the string matching problem. Index Terms —Processor array, string search, deep packet classification, parallel hardware. � 1 I NTRODUCTION T firewalls, intrusion detection, etc. For such applications, HE string matching problem is employed in packet classification, computational biology, spam blocking, traditional look-up table and CAM (content-addressable memory)-based search engines are not suitable [11], [12]. A and information retrieval, to mention only a few applica- string search algorithm-based search engine is the most tions. String search operates on a given alphabet set � of size j � j , a pattern P ¼ p 0 p 1 � � � p m � 1 of length m , and a text suitable for those applications [11], [13]. Several efficient string T ¼ t 0 t 1 � � � t n � 1 of length n , with m � n . The problem linear string search algorithms have been developed [1], is to find all occurrences of pattern in the text string. [14], [15]. Most of these algorithms use preprocessing to The average time complexity for implementing the string speed-up their search operations. This preprocessing search problem on a single processor was proven to be Oð n Þ requires search operations and data index update. These [1]. To meet the requirement of fast string matching, several preprocessing operations do not use regular or iterative hardware solutions were proposed that made use of operations, thus making them unsuitable for processor advances in Very Large Scale Integration (VLSI) and array implementation. In [1], we proposed an algorithm processor array design techniques. Processor arrays are that achieves better performance without any preproces- simple, regular, and modular structures for implementing sing. But, that algorithm is suitable for the single processor several recursive algorithms [2], [3], [4]. Several authors based hardware. In this paper, we deal with processor developed techniques for mapping regular iterative algo- array-based hardware solutions. rithms onto processor arrays [3], [4], [5], [6], [7], [8], [9]. This A hardware implementation for the algorithmic search paper presents a systematic methodology for obtaining engine for packet classification can be assumed to have the several processor array architectures for deep packet following characteristics: classification based on the techniques developed in [9]. Packet classification refers to the identification and The text length n is typically big and variable . classification of individual data packets arriving at a switch. depending on the packet payload. There are three types of packet classification tasks [10]: The pattern length m varies from a word of few . 1) Single-field classification (SFC) looks at a single field in characters to hundreds of characters (e.g., a URL the packet header and is used mostly in packet routing. address). 2) Multifield classification (MFC) scans multiple fields of a The word length w is determined by the data storage . packet header to classify packets and support quality of organization and datapath bus width. service (QoS) policies. 3) Deep packet classification (DPC) Typically, the search engine is looking for the . [10], [11] examines the packet payload data in order to make existence of the pattern P in the text T , i.e., the classification decisions for the high-level applications. This search engine only locates the first occurrence of the paper deals with a hardware support for the DPC. P in T . The need for DPC is increasing rapidly with the The text string T is supplied to the hardware in emerging content-aware applications, such as content- . switching, load balancing, data streaming, policy-based word-serial format. This paper is organized as follows: Section 2 discusses the literature related to parallel algorithms and hardwares . The authors are with the Department of Electrical and Computer for the string search problem. Section 3 introduces the Engineering, University of Victoria, Victoria BC, V8W 3P6, Canada. systematic methodology we employed to design the E-mail: {fayez, nrafiq}@engr.uvic.ca. processor array architecture. Sections 4, 5, and 6 describe Manuscript received 28 July 2004; revised 21 Mar. 2005; accepted 26 Apr. the resulting processor arrays derived in Section 3. Section 7 2005; published online 25 Jan. 2006. discusses the complexity analyses of our proposed hard- For information on obtaining reprints of this article, please send e-mail to: wares. We verify the analysis results of the time complexity tpds@computer.org, and reference IEEECS Log Number TPDS-0186-0704. 1045-9219/06/$20.00 � 2006 IEEE Published by the IEEE Computer Society
Recommend
More recommend