with fib explosion
play

with FIB Explosion Tong Yang(ICT), Gaogang Xie(ICT), Yanbiao - PowerPoint PPT Presentation

Guarantee IP Lookup Performance with FIB Explosion Tong Yang(ICT), Gaogang Xie(ICT), Yanbiao Li(HNU), Qiaobin Fu(ICT) Alex X. Liu(MSU), Qi Li(ICT), Laurent Mathy(ULG) Performance Issue in IP Lookup FIB increasing: 15% per year; FIB size:


  1. Guarantee IP Lookup Performance with FIB Explosion Tong Yang(ICT), Gaogang Xie(ICT), Yanbiao Li(HNU), Qiaobin Fu(ICT) Alex X. Liu(MSU), Qi Li(ICT), Laurent Mathy(ULG)

  2. Performance Issue in IP Lookup  FIB increasing: 15% per year; FIB size: 512,000  512k bug : In 2014.8, Cisco says that web browsing speeds could slow over the next week as old hardware is upgraded to handle the 512K FIB. FIB 512k now 2

  3. Motivation  On-chip vs. Off-chip memory. 10 times faster, but limited in size.  With FIBs increasing, for almost all packets Constant yet fast Constant yet small + lookup speed: footprint for FIB: Low Time Complexity On-chip Memory 3 Ideal IP Lookup Algorithm

  4. State-of-the-art  Achieving constant IP lookup time – TCAM-based – Trie pipeline using FPGA – full-expansion – DIR-24-8  Achieving small memory – Based on Bloom Filter – Level compression, path compression – LC-trie How to satisfy both constant lookup time and small on-chip memory usage? 4

  5. SAIL Framework  Observation: almost all packets hit 0~24 prefixes  Two Splitting – Splitting lookup process – Splitting prefix length Finding prefix length Finding next hop Prefix length 0~24 On-chip Off-chip Prefix length 25~32 Off-chip Off-chip 5

  6. Splitting Original trie Bitmap arrays Next hop arrays 1 6 Level 0~24 1 0 1 1 8 0 3 1 Short prefixes … 0 1 … 0 3 0 0 1 1 0 0 9 2 Level … … 1 0 1 1 1 1 3 0 4 5 1 1 25~32 … 1 0 7 1 … 1 0 1 1 1 1 2 1 Long prefixes 24 Bit Maps 0-24 2 𝑗 = 4𝑁𝐶 On-Chip 𝑗=0 How to avoid searching both short and long prefixes? 6

  7. Pivot Pushing & Lookup Pivot push: Lookup 001010 FIB Trie Bit maps Pivot level: 4 level 0 B 0 next- 6 1 prefix hop B 4 [001010 >> 2] = 1 level 1 */0 6 B 1 4 0 1 1*/1 4 N 4 [2] = 0 level 2 B 2 3 0 1 0 0 01*/2 3 O B 3 level 3 001*/3 3 3 7 0 1 0 0 0 0 0 1 111*/3 7 B C A B 4 … 1 8 0 0 1 1 0 0 1 0 0011*/4 1 E D F H N 4 … 0 0 0 1 0 0 0 0 1110*/4 8 3 3 2 8 11100*/5 2 N 3 0 3 0 0 0 0 0 7 G 9 001011*/6 9 long prefix (a) (b) (c) 7

  8. Update of SAIL_B Insert 10* FIB Trie Bit maps B 2 [10]=1 level 0 B 0 next- 6 1 prefix hop level 1 */0 6 delete111* B 1 4 0 1 B 3 [111]=0 1*/1 4 level 2 B 2 3 1 0 1 0 0 01*/2 3 O B 3 level 3 001*/3 3 0 3 7 0 1 0 0 0 0 0 1 111*/3 7 B C A B 4 … 1 8 0 0 1 1 0 0 1 0 0011*/4 1 E D F H N 4 … 0 0 0 1 0 0 0 0 1110*/4 8 3 3 2 8 11100*/5 2 N 3 0 3 0 0 0 0 0 7 G 9 001011*/6 9 (a) (b) (c) changing 001*, or inserting 0010* only need to update off-chip tables 8

  9. Optimization  SAIL_B – Lookup: 25 on-chip memory accesses in worst case – Update: 1 on-chip memory access  Lookup Oriented Optimization (SAIL_L) – Lookup: 2 on-chip memory accesses in worst case – Update: unbounded, low average update complexity  Update Oriented Optimization (SAIL_U) – Lookup: 4 on-chip memory accesses in worst case – Update: 1 on-chip memory access  Extension: SAIL for Multiple FIBs (SAIL_M) 9

  10. SAIL_L Y If B16==1 N N16 Level 16 If B24==1 Y N N24 N32 Level 24 Level 32 10

  11. SAIL_U • Pushing to levels 6, 12, 18, and 24. Level 6 • One update at most affects 2^6= 64 bits in the bitmap array. Level 12 Level 18 Still at most one on-chip memory access is enough for each update. Level 24 11

  12. SAIL_M A: 00* A: 00* A:00* Trie 2 Overlay Trie Trie 1 C: 10* B: 01* C:10* G: 110* E: 100* E:100* F: 101* G: 110* + H: 111* D D A A C A B C C G E H E F G (a) (b) (c) 12

  13. SAILs in worst case On-Chip Lookup Update Memory (on-chip) (on-chip) SAIL_B = 4MB 25 1 ≤ 2.13MB SAIL_L 2 Unbounded ≤ 2.03MB SAIL_U 4 1 ≤ 2.13MB SAIL_M 2 Unbounded Worst case: 2 off-chip memory accesses for lookup 13

  14. Implementations  FPGA: Xilinx ISE 13.2 IDE; Xilinx Virtex 7 device; On- chip memory is 8.26MB – SAIL_B, SAIL_U, and SAIL_L  Intel CPU: Core(TM) i7-3520M 2.9 GHz; 64KB L1, 512KB L2, 4MB L3; DRAM 8GB – SAIL_L and SAIL_M  GPU: NVIDIA GPU (Tesla C2075, 1147 MHz, 5376 MB device memory, 448 CUDA cores), Intel CPU (Xeon E5- 2630, 2.30 GHz, 6 Cores). – SAIL_L  Many-core : TLR4-03680, 36 cores, each 256K L2 cache. – SAIL_L 14

  15. Evaluation  FIBs – Real FIB from a tier-1 router in China – 18 real FIBs from www.ripe.net  Traces – Real packet traces from the same tier-1 router – Generating random packet traces – Generating packer traces according to FIBs  Comparing with – PBF [sigcomm 03] – LC-trie [applied in Linux Kernel] – Tree Bitmap – Lulea [sigcomm 97 best paper] 15

  16. FPGA Simulation SAIL_L PBF On-chip memory usage 1.2MB 1.0MB 800.0kB 600.0kB 400.0kB 200.0kB 0.0B rrc00rrc01rrc03rrc04rrc05rrc06rrc07rrc10rrc11rrc12rrc13rrc14rrc15 FIB SAIL Algorithms Lookup Speed Throughput SAIL_B 351Mpps 112Gbps SAIL_U 405Mpps 130Gbps SAIL_L 479Mpps 153Gbps 16

  17. Intel CPU: real FIB and traces LC-trie TreeBitmap Lulea SAIL_L 800 700 Lookup speed (Mpps) 600 500 400 300 200 100 0 1 2 3 4 5 6 7 8 9 10 11 12 FIB 17

  18. Intel CPU: 12 FIBs using prefix-based and random traces 500 Prefix-based traffic Random Trace 400 Lookup speed (Mpps) 300 200 100 0 2 3 4 5 6 7 8 9 10 11 12 # of FIBs 18

  19. Intel CPU: Update # of memory accesses per update rrc00 14 average of rrc00 12 rrc01 average of rrc01 10 rrc03 average of rrc03 8 6 4 2 0 9 19 29 39 49 59 69 79 89 99 109 119 129 139 149 159 169 179 189 199 209 219 229 239 249 259 269 279 289 299 309 319 # of updates (*500) 19

  20. GPU: Lookup speed VS. batch size 650 30 600 60 550 90 500 Lookup speed (Mpps) 450 400 350 300 250 200 150 100 50 0 rrc00 rrc01 rrc03 rrc04 rrc05 rrc06 rrc07 rrc10 rrc11 rrc12 rrc13 rrc14 rrc15 FIB 20

  21. GPU: Lookup latency VS. batch size 240 30 220 60 90 200 Latency (microsecond) 180 160 140 120 100 80 60 40 20 0 rrc00 rrc01 rrc03 rrc04 rrc05 rrc06 rrc07 rrc10 rrc11 rrc12 rrc13 rrc14 rrc15 FIB 21

  22. Tilera GX-36: Lookup VS. # of cores 700M 600M Lookup speed (pps) 500M 400M 300M 200M 100M 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 # of cores 22

  23. Conclusion  Two-dimensional Splitting Framework: SAIL  Three optimization algorithms – SAIL_U, SAIL_L, SAIL_M – Up to 2.13MB on-chip memory usage – 2 off-chip memory accesses  Suitable for different platforms – FPGA, CPU, GPU, Many-core – Up to 673.22~708.71 Mpps  Future work: SAIL to IPv6 lookup 23

  24. Source codes of SAIL, LC-trie, Tree Bitmap, and Lulea http://fi.ict.ac.cn/firg.php?n=PublicationsAmpTalks.OpenSource 24

  25. Thanks Thanks http://fi.ict.ac.cn

Recommend


More recommend