february 2003 first technical colloquium february 10 11
play

February 2003 FIRST Technical Colloquium February 10-11, 2003 @ - PowerPoint PPT Presentation

February 2003 FIRST Technical Colloquium February 10-11, 2003 @ Uppsala, Sweden bifrost a high performance router & firewall Robert Olsson Hans Wassen Bifrost concept Small size Linux distribution targeted for


  1. February 2003 FIRST Technical Colloquium February 10-11, 2003 @ Uppsala, Sweden bifrost a high performance router & firewall Robert Olsson Hans Wassen

  2. � � � � � Bifrost concept Small size Linux distribution targeted for Flashdisks 20 MB Optimized for networking/firewalling Tested with selected drivers and hardware Open platform for development and collaboration Results and experiences shared

  3. � ✁ � � � � Bifrost concept Linux kernel collaboration FASTROUTE, HW_FLOCONTROL, New NAPI for network stack. Performance testing, development of tools and testing techniques Hardware validation, support from big vendors Detect and cure problems in lab not in the network infrastructure. Test deploy (Often in own network)

  4. Collaboration/development The New API

  5. ✂ � � ✁ ✂ ✂ ✁ ✂ ✂ ✁ ✁ Core Problems heavy net load: system congestion collapse High Interupt rates Livelock and Cache locality effects Interupts are just simply expensive CPU interupt driven: takes too long to drop bad packet Bus (PCI) Packets still being DMAed when system overloaded Memory bandwidth Continous allocs and frees to fill DMA rings Unfairness in case of a hogger netdev

  6. � ✁ � ✁ ✂ Overall Effect Inelegant handling of heavy net loads System collapse Scalability affected System and number of NICS A single hogger netdev can bring the system to its knees and deny service to others Summary 2.4 vs feedback 55 50 March 15 report on lkml 45 40 Thread: "How to optimize routing perfomance" 35 reported by Marten.Wikstron@framsfab.se 30 25 - Linux 2.4 peaks at 27Kpps 20 - Pentium Pro 200, 64MB RAM 15 10 5 0 0 10 20 30 40 50 60 70 80 90 100

  7. Looking inside the box IRQ SoftIRQ Later time Forwarding, Backlog queue To stack Backlog locally queue generated processing outgoing packets Transmit path Packet enqueued to backlog if queue not full Incoming packets from devices

  8. ✄ � ✁ ✂ ✂ � ✂ ✄ ✄ ✄ BYE BYE Backlog queue Packet stays in original queue (eg DMA) Netrx softirq foreach dev in poll list Calls dev->poll() to grab upto quota packets Device driver are polled from softirq and pkts are pulled and delivered to network stack. Dev driver indicates done/notdone. Done ==> we go back to IRQ mode. Nodone ==> device remain on polling list Breakes the netrx softirq at one jiffie or netdev_max_backlog This to ensure other taskes to run

  9. ☎ ☎ ☎ ☎ A high level view of new system pkts Interupt area Polling area Quota P P packets to deliver to the stack (on the RX ring) Horizontal line shows different netdevs with different input rates Area under curve shows how many packets before next interrupt Quota enforces fair share

  10. Kernel support NAPI kernel part was included in: 2.5.7 and back ported to 2.4.20 Current driver support: e1000 Intel GIGE NIC's tg3 BroadCom GIGE NIC's dl2k D-Link GIGE NIC's tulip (pending) 100 Mbs

  11. NAPI: observations & issues Ooh I get even more interrupts.... with polling. As we seen NAPI is an interrupt/polling hybrid. NAPI uses interrupts to guarantee low latency and at high loads interrupts never gets re-enabled. Consecutive polling occur. Old scheme added interrupt delay to handle CPU from being killed by interrupts. In the NAPI case we can do without this delay for the first time but it means more interrupts in low load situations. Should we add interrupt delay just of old habit?

  12. Flexible netlab at Uppsala University El cheapo-- High customable -- We write code :-) Ethernet Ethernet sink Test Tested | device generator device | linux linux * Raw packet performance * TCP * Timing * Variants

  13. Hardware for high perf. Networking Motherboard CPU Uni or multi-processor Chipset BX, ServerWorks, E750X BUS/PCI-design # PCI-BUS'es @ 133MHz Interrupt design PIC, IO-APIC etc Standby Power (Wake on Lan) can be a problem with many NIC's

  14. Hardware for high perf. Networking ServerWorks, Many vendors Intel E750X use Compact chipset PCI already many PCI-X hubs/bridges PCI-X is here bus at 8.5 Gbit/s And dual XEON CPU CPU PCI-X NIC I/O bridge Processor, I/O Memory and memory controller PCI-X NIC I/O bridge

  15. Hardware for high perf. Networking Currently Intel has advantage. Broadcom can be a dark horse. All has NAPI drivers. GIGE chipsets available for PCI e1000 Intel -- e1000 BCM5700 Broadcom – tg3 dl-2k D-Link -- dl2k Some board manufactors switch chipset often. Chip documentation a problem.

  16. Some GIGE experiments/NAPI Ping through a idle router Ping through a router under a DoS attack 890 kpps Ping latency/fairness under xtreme load/UP 500 478 475 450 426 425 Latency in microseconds 400 391 389 0 379 375 1 350 344 2 325 3 300 4 275 5 262 250 6 225 7 211 200 8 175 150 125 125 117 100 95 92 92 92 91 91 91 90 V 75 Idle DoS a Very well behaved just an increase a couple of 100 microsec !! e

  17. Some GIGE experiments Pktgen sending test w. 11 GIGE Clone = 8.5 Gbit/s Alloc = 5.4 Gbit/s 2*XEON 1.8 MHz packet sending @ 1518 byte 81300 pps is 1 Gbit/s 90000 80000 eth0 eth1 70000 eth2 eth3 60000 eth4 eth5 50000 packets/sec eth6 eth7 40000 eth8 eth9 30000 eth10 20000 10000 0 Clone Alloc SeverWorks X5DL8-GG Intel e1000

  18. Some GIGE experiments Pktgen sending test w. 11 GIGE Clone = 10.0 Gbit/s Alloc = 7.4 Gbit/s 2*XEON HyperThreading on 1.8 MHz packet sending @ 1518 byte 81300 pps is 1 Gbit/s 90000 80000 eth0 eth1 70000 eth2 eth3 60000 eth4 eth5 50000 packets/sec eth6 eth7 40000 eth8 eth9 30000 eth10 20000 10000 0 Clone Alloc SeverWorks X5DL8-GG Intel e1000

  19. Some GIGE experiments Aggregated sending performance from pktgen w. 11 GIGE. XEON 2*1.8 GHz @ 64 byte pkts 1.48 Mpps = 1 Gbit/s 2000 1750 1500 Alloc 1250 Clone Kpps 1000 750 500 250 0 w/o HT w HT

  20. Forwarding performance Linux forwarding rate at different pkt sizes Linux 2.5.58 UP/skb recycling 1.8 GHz XEON 900 800 700 600 Input 500 kpps Throughput 400 300 200 100 0 64 128 256 512 1024 1518 packet size Fills a GIGE pipe -- starting from256byte pkts

  21. R&D Parallelization Serialization I CPU 0 CPU 0 TX ring O A Eth0 Eth1 P CPU 1 CPU 1 I C For user apps new scheduler Eth1 holds skb's does affinty from different CPU's Clearing TX-buff releases cache bouncing But for packet forwarding.... eth0->eth1 CPU0 (we can set affinity eth1 -> CPU0) But it would be nice to other CPU for forwarding too. :-)

  22. R&D Very high transaction packet memory system for GIGE and upcoming 10GE Profiling indicates slab is not fully per-CPU Counter 0 counted GLOBAL_POWER_EVENTS events SMP-2-CPU vma samples %-age symbol name 300 kpps c0138e96 37970 8.23162 cache_alloc_refill c0229490 37247 8.07488 alloc_skb c0235e90 32491 7.04381 qdisc_restart c0235b54 27891 6.04657 eth_type_trans c02296d2 25675 8.67698 skb_release_data SMP-1-CPU c0235b54 24438 8.25893 eth_type_trans c0235e90 24047 8.12679 qdisc_restart 302 kpps c0229490 18188 6.14671 alloc_skb c0110a1c 15741 5.31974 do_gettimeofday Note setting input affinity helps. But we like to work on the general problem

  23. R&D V=vanilla router profile XEON no HT 2*1.8 GHz UP=uniprossor SMP1= SMP 1 CPU 550 SMP2= SMP 2 CPU RC= skb recycling 500 IA=input affinity 450 Routing Througput in kpps Profile with p4/xeon 400 performance counters 350 GLOBAL_POWER_EVENTS MISPRED_BRANCH_RETIRED 300 BSQ_CACHE_REFERENCE MACHINE_CLEAR 250 ITLB_REFERENCE 200 150 100 50 0 V UP V SMP2 V SMP1 V SMP2 RC UP RC UP RC RC IA SMP2 IA RC gcc-3.1 gcc-3.1 gcc-3.1 gcc- gcc-3.1 gcc- SMP2 SMP1 gcc-3.1 SMP2 2.95.3 2.95.3 gcc-3.1 gcc-3.1 gcc-3.1

  24. NAPI/SMP production in use: uu .se Stockholm Stockholm UU- 1 UU- 2 Full Internet routing PIII 933MHz via EBGP/IBGP 2.4.10poll/SMP AS 2834 L- uu1 L- uu2 Interneral UU-Net DMZ

  25. Real World use: ftp . sunet.se Stockholm OC- 48 AS 1653 GSR Full Internet routing PIII- 933MHz via EBGP/IBGP NAPI/IRQ Archive- r2 Archive- r1 AS 15980 Switch Ftp0 Load sharing & Redundancy Ftp1 with Router Discovery Ftp2

  26. IP-login -- a Linux router app. user authenticated routing user@host IP- login router User's can only reach the IP- login H R router. This hosts a web server. User web requests are directed to webserver and asked for username, password ev. Authetication server. Today TACACS H R If user/passwd is accepted. 1) Forwarding is enabled for host 2) Monitoring arping is started Based on stolen code from: Pawel Krawczyk -- tacacs client Loss of arping disables forwarding. Alexey Kuznetsov -- arping

  27. IP-login installation at Uppsala University Approx 1000 outlets

  28. A new network symbol has been seen.. . The Penguin Has Landed

  29. ✆ ✝ ✆ ✆ ✝ ✆ References and Other Stuff http://bifrost.slu.se Claim they can do 435 Kpps on PIII 700 http://www.pdos.lcs.mit.edu/click/ http://www.cyberus.ca/~hadi/usenix-paper.tgz Some other work http://robur.slu.se/Linux/net-development/

Recommend


More recommend