Linux Kongress 2008 Hamburg/Germany Robert Olsson Uppsala University Robert Olsson Uppsala University 2008-10-10
Over 10 years in production ● Thr e e m a j or i ns t a l l a t i ons ● UU c or e r out e r s t owa r ds SUNET ● UU St ude nt Ne t wor k 30. 000 s t ude nt s ● f t p. s une t . s e
Over 10 years in production UU facts Over 25.000 registered hosts Dual ISP BGP connect GIGE Local DMZ BGP peering GIGE Ipv4/Ipv6 OSPFv2/OSPFv3 600 netfilter rules 10 Cisco 6500 OSPF-routers Redundant Power 10g planned
Over 10 years in production
Over 10 years in production The SUNET FTP ARCHIVE AS1653 DMZ Juniper Juniper Full Internet routing IPv4, IPv6 Bifrost Bifrost LINUX LINUX AS15980 Router discovery IRDP ftp ftp ftp ftp 10TB
Over 10 years in production Student Network facts Dual ISP BGP connect GIGE Local DMZ BGP peering GIGE Ipv4 IRDP (ICMP) About 30 netfilter rules 19 netlogin-service boxes for premises Very “innovative” users Well connected 10g planned
Over 10 years in production
Over 10 years in production Student Network Core Router
IP-login installation at Uppsala University Approx 1000 outlets
Testing, Verification Development & Research ● St a r t e d out a s s i m pl e t e s t i ng. ● Cur i os i t y, Ope n Sour c e , Col l a bor a t i on ● Re l a t i ve l y f r e e dom , t he i de a t o us e i n own i nf r a s t r uc t ur e . No ne e d f or e xt e r na l f undi ng. ● OS wa s i nt e nde d f or de s kt ops .
Building Blocks Hardware: PC Motherbord/CPU/Memory Network Interfaces GIGE/10g WiFi etc Software Operating System Linux/BSD/Microsoft Applications Routing Daemons Quagga/XORP IP-login/netlogon Network Cable, Fiber, Copper Equipment, Switches
Testing, Verification Development & Research No ne e d f or t e s t ne t wor k. W e c oul d t e s t i n own i nf r a s t r uc t ur e . ( Or SLU) W e c oul d wor k on c om pl i c a t e d i s s ue s ● NAPI 3 ye a r s ● Pkt ge n 2 ye a r s ● f i b_t r i e 1ye a r ● TRASH 1 ye a r ● Ha r dwa r e Te s t i ng M a ny ye a r s
Flexible netlab at Uppsala University El cheapo-- High customable -- We write code :-) Ethernet Ethernet sink Test Tested | device generator device linux | linux * Raw packet performance * TCP * Timing * Variants
netlab at UU
Dual-Power supply PIII for many years ftp.sunet.se
Intel NIC's
Latest & Greatest Hardware Intel 10g board Chipset 82598 Open chip specs. Thanks Intel! But why fixed XFP's?? Better classifier needed.
Latest & Greatest Hardware 2U Hi-End Opteron box TYAN S2927/Barcelona
Not all were blessed...
Memory Latency lat_mem_rd from LMbench
Quad vs Dual Core Opteron 900 Surprising! 800 700 One CPU core on 2.3 GHz 600 is faster then is the 3.0 GHz 500 Dual-Core. 400 300 L3 cache, Microcode? 200 100 0 Dual - Cor e 2222 3. 0 G Hz Q uad- C or e 2365 2. 3 G Hz 2U Hi-End Opteron box TYAN S2927/Barcelona
Bifrost concept ● Linux kernel collaboration ● Performance testing, development of tools and testing techniques ● Hardware validation, support from big vendors ● Detect and cure problems in lab not in the network infrastructure. ● Test deploy (Often in own network)
The Linux Ashram The guru is ANK
Kernel footprints HW_FLOWCONTROL Tulip FASTROUTE path Whitehole device. In the middle of dev.c Hardwired IP addresses. (Russian?)
Overall Effect ● Inelegant handling of heavy net loads – System collapse ● Scalability affected – System and number of NICS ● A single hogger netdev can bring the system to its knees and deny service to others Sum m ar y 2. 4 vs f eedback 60 March 15 report on lkml 50 Thread: "How to optimize routing 40 perfomance" 30 reported by 20 Marten.Wikstron@framsfab.se - Linux 2.4 peaks at 27Kpps 10 - Pentium Pro 200, 64MB RAM 0 0 10 20 30 40 50 60 70 80 90 100
A high level view of new system pkt Interupt Polling area s area Quota P ➔ P packets to deliver to the stack (on the RX ring) ➔ Horizontal line shows different netdevs with different in ➔ Area under curve shows how many packets before next ➔ Quota enforces fair share
NAPI observations & issue: fairness Ping through a idle router Ping through a router under a DoS attack @ 890 kpps Pi ng l at ency/ f ai r ness under xt r em e l oad/ SM P 600 541 540 oseconds 500 408 400 380 cr m i 323 n 300 i ency 254 202 190 at 200 l ng 123 122 105 99 101 95 95 96 96 93 Pi 100 V 0 a I dl e DoS Very well behaved just an increase a couple of 100 microsec !! e
NAPI Kernel support NAPI kernel part was included in: 2.5.7 and back ported to 2.4.20 Current driver support: e1000 Intel GIGE NIC's – (UFO driver) First driver where (RX & TX done in softirq) tg3 BroadCom GIGE NIC's dl2k D-Link GIGE NIC's tulip (pending) 100 Mbs
Forwarding performance (old) Li nux f or war di ng r at e at di f f er ent pkt si zes Li nux 2. 5. 58 UP/ skb r ecycl i ng 1. 8 G Hz XEO N 900 800 700 600 I nput kpps 500 Thr oughput 400 300 200 100 0 64 128 256 512 1024 1518 packet si ze Fills a GIGE pipe -- starting from256byte pkts
ipv6 performance(old) Forwarding kpps 76 byte pkt. Linux 2.5.12 1 CPU(SMP) Opteron 1.6 GHz e1000 700 600 500 Single flow small Singe flow 543 r 400 rDoS 543 r 300 200 100 0 T-put How rDoS work on sparse routing table?
fib_trie performance comparison forwarding kpps Linux 2.6.16 1 CPU used(SMP) Opteron 1.6 GHz e1000 700 600 dsh hash 500 5 r single flow 5 r rDoS 400 123kr rDoS 300 200 100 0 fib_hash fib_trie Preroute patches to disable route hash
32/64 bit || sizeof(sk_buff) relative forwarding sizeof(struct sk_buff) 300 0.6 250 0.5 200 0.4 size T-put 150 0.3 100 0.2 50 0.1 0 0 32 64 64 bit 32 bit Gcc 3.4 x86_64 vs i686 on same HW
Trash data-structure Interesting novel approach. Trie-Hash --> Trash When extending the LC-trie Paper with Stefan Nilsson/KTH Exploits that key-length does not affect tree depth We lengthen the so key it can be better compressed. Implemented in Linux forwarding patch as a replacement to the route hash.
Trash data-structure Can do full key lookup. src/dst/sport/dport/proto/if etc and later socket. For even ip6 with little performance degradation Could be a candidate for the grand unified lookup Full flow lookup can understand connections. Free flow logging etc New garbage collection (GC) possible. Active GC stated AGC in the paper. Listen to TCP SYN, FIN and RST Show to be performance winner.
Trash data-structure Uppsala Universitet core router
Trash data-structure Very flat(fast) trees
Fully parallel router multi-queue breakthrough Load from one incoming 10g interface can be split among several CPU-cores Using RSS (Receiver Scale Option). New NIC HW classifier MSI-X interrupts affinity for RX, TX so a packet a skb is handled by one CPU core. Breakthrough forwarding and for networking in general.
Fully parallel router concept multi-queue breakthrough In experiment we used Intel 82598 adapters. Intel follows MS NDIS 6.0 for virtualization SUN's 10g board has a more potent HW classifier aka TCAM. Potent classifiers can yet another breakthrough for both functions and performance. Control plane separation, (routing daemons) QoS, filters etc.
Fully parallel router multi-queue breakthrough Flow load. 31.000 fib_lookups/sec BGP table w. 271.064 routes Different 3 packet sizes 64 bytes 45% 576 bytes 25% 1500 bytes 30% RSS and Multi-Queue (RX and TX) in use Linux 2.6.27-rc2 ixgbe-1.3.31.5 + patches Using 2/4 CPU cores from AMD Barcelona 2.3 GHz Forwarding:: 6.2 Gbit/s (960 kpps)
10g boards multi-queue breakthrough SUN's seems to use XFP's. Anyone using it.... Other boards with SFP/SFP+/XFP ??
A new network symbol has been seen.. . The Penguin Has Landed
GigaSUNET UU SLU1 SLU2 80.73/32 80.74/32 DC HVC 88.33/30 88.49/30 130. 242. DMZ UU/ITS ultKC-gw 127.54 Switch HVC e0 knutpunkt KC 193.10.131.0/24 88.50/30 e1 e1 88.34/30 SLU's nät 193.10.131 .5 .4 e5 e5 127.82 e4 e4 127.81 e1 ultgw-2 1 1 ultgw-1 ultGC-gw 127.61 (inte hela) 127.2 96.2 e /24 e3 96.61 127.1 3 e8 GC HVC 3 DC e3 e6 98.61 98.2 e6 /24 1 3 e0 127.58 e7 127.17 127.69 e9 e2 127.45 127.57 e2 127.53 e3 ultrouter7 127.7 127.62 127.18 e0 3 HVC e0 127.46 2 1 ultrouter8 127.101 e10 127.21 expgw.data 127.8 e9 127.102 e0 HVC DC 1 skara-gw 127.13 e1 34 Mb e2 e3 ..233.33/24 127.70 e1 e0 127.14 1 3 ultrouter9 127.85 e2 ultrouter6 2 127.9 3 127.6 3 127.22 DC 127.86 HVC e10 118.1 e0 e3
Recommend
More recommend