Guarantee IP Lookup Performance with FIB Explosion Tong Yang(ICT), Gaogang Xie(ICT), Yanbiao Li(HNU), Qiaobin Fu(ICT) Alex X. Liu(MSU), Qi Li(ICT), Laurent Mathy(ULG)
Performance Issue in IP Lookup FIB increasing: 15% per year; FIB size: 512,000 512k bug : In 2014.8, Cisco says that web browsing speeds could slow over the next week as old hardware is upgraded to handle the 512K FIB. FIB 512k now 2
Motivation On-chip vs. Off-chip memory. 10 times faster, but limited in size. With FIBs increasing, for almost all packets Constant yet fast Constant yet small + lookup speed: footprint for FIB: Low Time Complexity On-chip Memory 3 Ideal IP Lookup Algorithm
State-of-the-art Achieving constant IP lookup time – TCAM-based – Trie pipeline using FPGA – full-expansion – DIR-24-8 Achieving small memory – Based on Bloom Filter – Level compression, path compression – LC-trie How to satisfy both constant lookup time and small on-chip memory usage? 4
SAIL Framework Observation: almost all packets hit 0~24 prefixes Two Splitting – Splitting lookup process – Splitting prefix length Finding prefix length Finding next hop Prefix length 0~24 On-chip Off-chip Prefix length 25~32 Off-chip Off-chip 5
Splitting Original trie Bitmap arrays Next hop arrays 1 6 Level 0~24 1 0 1 1 8 0 3 1 Short prefixes … 0 1 … 0 3 0 0 1 1 0 0 9 2 Level … … 1 0 1 1 1 1 3 0 4 5 1 1 25~32 … 1 0 7 1 … 1 0 1 1 1 1 2 1 Long prefixes 24 Bit Maps 0-24 2 𝑗 = 4𝑁𝐶 On-Chip 𝑗=0 How to avoid searching both short and long prefixes? 6
Pivot Pushing & Lookup Pivot push: Lookup 001010 FIB Trie Bit maps Pivot level: 4 level 0 B 0 next- 6 1 prefix hop B 4 [001010 >> 2] = 1 level 1 */0 6 B 1 4 0 1 1*/1 4 N 4 [2] = 0 level 2 B 2 3 0 1 0 0 01*/2 3 O B 3 level 3 001*/3 3 3 7 0 1 0 0 0 0 0 1 111*/3 7 B C A B 4 … 1 8 0 0 1 1 0 0 1 0 0011*/4 1 E D F H N 4 … 0 0 0 1 0 0 0 0 1110*/4 8 3 3 2 8 11100*/5 2 N 3 0 3 0 0 0 0 0 7 G 9 001011*/6 9 long prefix (a) (b) (c) 7
Update of SAIL_B Insert 10* FIB Trie Bit maps B 2 [10]=1 level 0 B 0 next- 6 1 prefix hop level 1 */0 6 delete111* B 1 4 0 1 B 3 [111]=0 1*/1 4 level 2 B 2 3 1 0 1 0 0 01*/2 3 O B 3 level 3 001*/3 3 0 3 7 0 1 0 0 0 0 0 1 111*/3 7 B C A B 4 … 1 8 0 0 1 1 0 0 1 0 0011*/4 1 E D F H N 4 … 0 0 0 1 0 0 0 0 1110*/4 8 3 3 2 8 11100*/5 2 N 3 0 3 0 0 0 0 0 7 G 9 001011*/6 9 (a) (b) (c) changing 001*, or inserting 0010* only need to update off-chip tables 8
Optimization SAIL_B – Lookup: 25 on-chip memory accesses in worst case – Update: 1 on-chip memory access Lookup Oriented Optimization (SAIL_L) – Lookup: 2 on-chip memory accesses in worst case – Update: unbounded, low average update complexity Update Oriented Optimization (SAIL_U) – Lookup: 4 on-chip memory accesses in worst case – Update: 1 on-chip memory access Extension: SAIL for Multiple FIBs (SAIL_M) 9
SAIL_L Y If B16==1 N N16 Level 16 If B24==1 Y N N24 N32 Level 24 Level 32 10
SAIL_U • Pushing to levels 6, 12, 18, and 24. Level 6 • One update at most affects 2^6= 64 bits in the bitmap array. Level 12 Level 18 Still at most one on-chip memory access is enough for each update. Level 24 11
SAIL_M A: 00* A: 00* A:00* Trie 2 Overlay Trie Trie 1 C: 10* B: 01* C:10* G: 110* E: 100* E:100* F: 101* G: 110* + H: 111* D D A A C A B C C G E H E F G (a) (b) (c) 12
SAILs in worst case On-Chip Lookup Update Memory (on-chip) (on-chip) SAIL_B = 4MB 25 1 ≤ 2.13MB SAIL_L 2 Unbounded ≤ 2.03MB SAIL_U 4 1 ≤ 2.13MB SAIL_M 2 Unbounded Worst case: 2 off-chip memory accesses for lookup 13
Implementations FPGA: Xilinx ISE 13.2 IDE; Xilinx Virtex 7 device; On- chip memory is 8.26MB – SAIL_B, SAIL_U, and SAIL_L Intel CPU: Core(TM) i7-3520M 2.9 GHz; 64KB L1, 512KB L2, 4MB L3; DRAM 8GB – SAIL_L and SAIL_M GPU: NVIDIA GPU (Tesla C2075, 1147 MHz, 5376 MB device memory, 448 CUDA cores), Intel CPU (Xeon E5- 2630, 2.30 GHz, 6 Cores). – SAIL_L Many-core : TLR4-03680, 36 cores, each 256K L2 cache. – SAIL_L 14
Evaluation FIBs – Real FIB from a tier-1 router in China – 18 real FIBs from www.ripe.net Traces – Real packet traces from the same tier-1 router – Generating random packet traces – Generating packer traces according to FIBs Comparing with – PBF [sigcomm 03] – LC-trie [applied in Linux Kernel] – Tree Bitmap – Lulea [sigcomm 97 best paper] 15
FPGA Simulation SAIL_L PBF On-chip memory usage 1.2MB 1.0MB 800.0kB 600.0kB 400.0kB 200.0kB 0.0B rrc00rrc01rrc03rrc04rrc05rrc06rrc07rrc10rrc11rrc12rrc13rrc14rrc15 FIB SAIL Algorithms Lookup Speed Throughput SAIL_B 351Mpps 112Gbps SAIL_U 405Mpps 130Gbps SAIL_L 479Mpps 153Gbps 16
Intel CPU: real FIB and traces LC-trie TreeBitmap Lulea SAIL_L 800 700 Lookup speed (Mpps) 600 500 400 300 200 100 0 1 2 3 4 5 6 7 8 9 10 11 12 FIB 17
Intel CPU: 12 FIBs using prefix-based and random traces 500 Prefix-based traffic Random Trace 400 Lookup speed (Mpps) 300 200 100 0 2 3 4 5 6 7 8 9 10 11 12 # of FIBs 18
Intel CPU: Update # of memory accesses per update rrc00 14 average of rrc00 12 rrc01 average of rrc01 10 rrc03 average of rrc03 8 6 4 2 0 9 19 29 39 49 59 69 79 89 99 109 119 129 139 149 159 169 179 189 199 209 219 229 239 249 259 269 279 289 299 309 319 # of updates (*500) 19
GPU: Lookup speed VS. batch size 650 30 600 60 550 90 500 Lookup speed (Mpps) 450 400 350 300 250 200 150 100 50 0 rrc00 rrc01 rrc03 rrc04 rrc05 rrc06 rrc07 rrc10 rrc11 rrc12 rrc13 rrc14 rrc15 FIB 20
GPU: Lookup latency VS. batch size 240 30 220 60 90 200 Latency (microsecond) 180 160 140 120 100 80 60 40 20 0 rrc00 rrc01 rrc03 rrc04 rrc05 rrc06 rrc07 rrc10 rrc11 rrc12 rrc13 rrc14 rrc15 FIB 21
Tilera GX-36: Lookup VS. # of cores 700M 600M Lookup speed (pps) 500M 400M 300M 200M 100M 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 # of cores 22
Conclusion Two-dimensional Splitting Framework: SAIL Three optimization algorithms – SAIL_U, SAIL_L, SAIL_M – Up to 2.13MB on-chip memory usage – 2 off-chip memory accesses Suitable for different platforms – FPGA, CPU, GPU, Many-core – Up to 673.22~708.71 Mpps Future work: SAIL to IPv6 lookup 23
Source codes of SAIL, LC-trie, Tree Bitmap, and Lulea http://fi.ict.ac.cn/firg.php?n=PublicationsAmpTalks.OpenSource 24
Thanks Thanks http://fi.ict.ac.cn
Recommend
More recommend