IEEE INFOCOM 2002 1 Scalable IP Lookup for Programmable Routers David E. Taylor, John W. Lockwood, Todd Sproull, Jonathan S. Turner, David B. Parlour high-performance routers continues to increase, there is a Abstract —Continuing growth in optical link speeds places increasing demands on the performance of Internet routers, growing need for efficient lookup algorithms and effec- while deployment of embedded and distributed network ser- tive implementations of those algorithms. Next generation vices imposes new demands for flexibility and programma- routers must be able to support thousands of optical links bility. IP address lookup has become a significant perfor- each operating at 10 Gb/s (OC-192) or more. Lookup tech- mance bottleneck for the highest performance routers. New niques that can scale efficiently to high speeds and large commercial products utilize dedicated Content Addressable lookup table sizes are essential for meeting the growing Memory (CAM) devices to achieve high lookup speeds. This performance demands while maintaining acceptable per- paper describes an efficient, scalable lookup engine design, able to achieve high-performance with the use of a small port costs. portion of a reconfigurable logic device and a commodity Many techniques are available to perform IP address Random Access Memory (RAM) device. Based on Eather- lookups. Perhaps the most common approach in high- ton’s Tree Bitmap algorithm [1], the Fast Internet Protocol performance systems is to use Content Addressable Mem- Lookup (FIPL) engine can be scaled to achieve over 9 mil- lion lookups per second at the fairly modest clock speed of ory (CAM) devices and custom Application Specific In- 100 MHz. FIPL’s scalability, efficiency, and favorable up- tegrated Circuits (ASICs). While this approach can pro- date performance make it an ideal candidate for System- vide excellent performance, the performance comes at a On-a-Chip (SOC) solutions for programmable router port fairly high price, due to the relatively high cost per bit processors. of CAMs, relative to commodity memory devices. CAM- Keywords — Internet Protocol (IP) lookup, router, re- based lookup tables are expensive to update, since the in- configurable hardware, Field-Programmable Gate Array sertion of a new routing prefix may require moving an un- (FPGA), Random Access Memory (RAM). bounded number of existing entries. The CAM approach also offers little or no flexibility for adapting to new ad- I. I NTRODUCTION dressing and routing protocols. The Fast Internet Protocol Lookup (FIPL) engine, de- R OUTING of Internet Protocol (IP) packets is the pri- veloped at Washington University in St. Louis, is a high- mary purpose of Internet routers. Simply stated, rout- performance, solution to the lookup problem, that uses ing an IP packet involves forwarding each packet along Eatherton’s Tree Bitmap algorithm [1], reconfigurable a multi-hop path from source to destination. The speed hardware and Random Access Memory (RAM). Imple- at which forwarding decisions are made at each router or mented in a Xilinx Virtex-E Field Programmable Gate Ar- “hop” places a fundamental limit on the performance of ray (FPGA) running at 100 MHz and using a Micron 1 MB the router. For Internet Protocol Version 4 (IPv4), the for- Zero Bus Turnaround (ZBT) Synchronous Random Ac- warding decision is based on a 32-bit destination address cess Memory (SRAM), a single FIPL lookup engine has a carried in each packet’s header. A lookup engine at each guaranteed worst case performance of 1,134,363 lookups port of the router uses a suitable routing data structure to per second. Time-Division Multiplexing (TDM) of eight determine the appropriate outgoing link for the packet’s FIPL engines over a single 36 bit wide SRAM interface, destination address. yields a guaranteed worst case performance of 9,090,909 The use of Classless Inter-Domain Routing (CIDR) lookups per second. Still higher performance is possible complicates the lookup process, requiring a lookup en- with higher memory bandwidths. In addition, the data gine to search variable-length address prefixes in order to structure used by FIPL is straightforward to update, and find the longest matching prefix of the destination address can support up to 10,000 updates per second with less and retrieve the corresponding forwarding information [2]. than a 9% degradation in lookup throughput. Targeted to As physical link speeds grow and the number of ports in an open-platform research router, implementations utilized Taylor, Lockwood, Sproull, and Turner are with the Applied Re- standard FPGA design flows. Ongoing research seeks to search Laboratory, Washington University in Saint Louis. E-mail: exploit new FPGA devices and more advanced CAD tools f det3,lockwood,todd,jst g @arl.wustl.edu. This work supported in part in order to double the clock frequency and, therefore, dou- by NSF ANI-0096052 and Xilinx, Inc. ble the lookup performance. Parlour is with Xilinx, Inc. E-mail: dave.parlour@xilinx.com
IEEE INFOCOM 2002 2 II. R ELATED W ORK table for a given 32-bit IPv4 destination address and re- trieving the associated forwarding information. As shown Numerous research and commercial IP lookup tech- in Figure 1, the unicast IP address is compared to the stored niques exist. On the commercial front, several compa- prefixes starting with the most significant bit. In this exam- nies have developed high speed lookup techniques using ple, a packet is bound for a workstation at Washington Uni- CAMs and ASICs. Some current products, targeting OC- versity in St. Louis. A linear search through the table re- 768 (40 Gb/s) and quad OC-192 (10 Gb/s) link configura- sults in three matching prefixes: *, 10*, and 1000000011*. tions, claim throughputs of up to 100 million lookups per The third prefix is the longest match, hence its associated second and storage for 100 million entries [3]. However, forwarding information, denoted by Next Hop 7 in the ex- these products requiring 16 cascaded ASICs with embed- ample, is retrieved. Using this forwarding information, the ded CAMs in order to achieve the advertised performance packet is forwarded to the specified next hop by modifying levels as well and to support even a more realistic one mil- the packet header. lion table entries. Such exorbitant hardware resource re- quirements make these solutions prohibitively expensive Prefix Next Hop for implementation in large routers. 35 The most efficient lookup algorithm known, from a * 7 theoretical perspective is the “binary search over prefix 10* lengths” algorithm described in [4]. The number of steps 01* 21 required by this algorithm grows logarithmically in the 110* 9 length of the address, making it particularly attractive for 1011* 1 IPv6, where address lengths increase to 128 bits. However, the algorithm is relatively complex to implement, making 0001* 68 it more suitable for software implementation than hard- 01011* 51 ware implementation. It also does not readily support in- 00110* 3 cremental updates. The Lulea algorithm is the most similar of published al- 10001* 6 gorithms to the Tree Bitmap algorithm used in our FIPL 100001* 33 engine [5]. Like Tree Bitmap, the Lulea algorithm uses 10000000* 54 a type of compressed trie to enable high speed lookup, while maintaining the essential simplicity and easy updata- 1000000000* 12 bility of elementary binary tries. While similar at a high 1000000011* 7 level, the two algorithms differ in a variety of specifics, 32−bit IP Address that make Tree Bitmap somewhat better suited to efficient 128.252.153.160 hardware implementation. 1000 0000 1111 1100 ... 1010 0000 The remaining sections focus on the design and im- Next Hop plementation details of a fast and scalable lookup engine 7 based on the Tree Bitmap algorithm. The FIPL engine of- Fig. 1. IP prefix lookup table of next hops. Next hops for IP fers an efficient and flexible alternative geared to System- packets are found using the longest matching prefix in the On-a-Chip (SOC) router port processor implementations. table for the unicast destination address of the IP packet. With tightly bounded worst-case performance and mini- mal update overhead, FIPL is well-suited for use in high- To efficiently perform this lookup function in hardware, performance programmable routers, which must be capa- the Tree Bitmap algorithm starts by storing prefixes in a ble of switching even minimum length packets at wire binary trie as shown in 2. Shaded nodes denote a stored speeds [6]. prefix. A search is conducted by using the IP address bits to traverse the trie, starting with the most significant bit III. T REE B ITMAP A LGORITHM of the address. To speed up this searching process, mul- Eatherton’s Tree Bitmap algorithm is a hardware based tiple bits of the destination address are compared simulta- algorithm that employs a multibit trie data structure to per- neously. In order to do this, subtrees of the binary trie are form IP forwarding lookups with efficient use of mem- combined into single nodes producing a multibit trie; this ory [1]. Due to the use of CIDR, a lookup consists of find- reduces the number of memory accesses needed to perform ing the longest matching prefix stored in the forwarding a lookup. The depth of the subtrees combined to form a
Recommend
More recommend