Reflections on data plane performance, iptables and ipsets Neil Jerram – Metaswitch & Project Calico @neiljerram www.projectcalico.org
Who am I? • Free software hacker since 1990s ; line+.el ; ; version 1.1 ; ; This has not (yet) been accepted by the Emacs Lisp archive, ; but if it is the archive entry will probably be something like this: ;; line+|Neil Jerram|nj...@cus.cam.ac.uk| ;; Line Numbering & Interrupt Driven Actions| ;; 1993-02-18|1.1|<archive pathname of line+.el>| ; Mished and mashed by Neil Jerram <nj...@cus.cam.ac.uk>, ; Monday 21 December 1992. • Metaswitch (previously Data Connection) since 1995
Free software work • Emacs • Guile • Openmoko and GTA04 smartphones
Metaswitch and Project Calico • 30+ year provider of high quality networking software, but mostly proprietary • Software -> hardware -> and now back again! • Now also leading projects as open source • Project Clearwater • Project Calico
So, Calico? • Connectivity and security for workloads (aka endpoints, aka micro-services, aka containers or VMs) in an elastic computing environment • e.g. a data center • Emphasis on simplicity and scalability • Based on standard Linux features • routing, iptables • and Internet protocols (BGP) • Mainline case L3 only
Old, zone-based security
Services in an elastic environment
Distributed firewall security
Calico architecture
Data plane performance questions • Can we get same bandwidth between endpoints as between those endpoints’ hosts? • What is CPU cost, and how does it compare with other networking approaches? • What are the effects of our iptables and ipset programming?
Testing methodology • Two hosts, directly connected by 10Gb link • 8 core • 64Gb RAM • 3.13 kernel • No tuning • qperf, using TCP • Measure CPU usage, raw throughput and packet latency
Configurations • Bare metal, i.e. host to host • Between OpenStack VMs • ‘TAP’ interface between VM and host • Between containers • veth pair between container namespace and host namespace • Between OpenStack VMs using Open vSwitch (OVS) and VXLAN • MTU 1500, send sizes 20000 and 500
Data plane throughput • Saturation for 20k messages … (red bars) • … but not for 500 messages (blue bars) • Why? • OpenStack better than bare metal? • OVS case reaches >8Gb/s if MTU is increased to 9000
CPU usage • CPU-limited for small messages • OpenStack cases can use more cores • Extra CPU cost for virtualization • Namespace • TAP or veth interface • Routing in guest as well as host
CPU usage per throughput • CPU required to drive each Gb/s of throughput
Latency • Tiny extra latency for containers • More for VMs • But acceptable • Note micro seconds • Not milli !
Security rules
iptables and ipsets -A felix-FORWARD -i tap+ -j felix-FROM-ENDPOINT -A felix-FORWARD -o tap+ -j felix-TO-ENDPOINT • iptables on a given host should be the composition of many logical -A felix-FORWARD -i tap+ -j ACCEPT -A felix-FORWARD -o tap+ -j ACCEPT security rules -A felix-FROM-ENDPOINT -i tap7f470881-51 -g felix-from-7f470881-51 -A felix-FROM-ENDPOINT -j DROP -A felix-INPUT -i tap+ -j felix-FROM-ENDPOINT -A felix-INPUT -i tap+ -j ACCEPT • Will this impact data plane performance? -A felix-TO-ENDPOINT -o tap7f470881-51 -g felix-to-7f470881-51 -A felix-TO-ENDPOINT -j DROP -A felix-from-7f470881-51 -m conntrack --ctstate INVALID -j DROP -A felix-from-7f470881-51 -m conntrack --ctstate RELATED,ESTABLISHED -j RETURN • Actually, no -A felix-from-7f470881-51 -p udp -m udp --sport 68 --dport 67 -j RETURN -A felix-from-7f470881-51 -s 10.28.0.40/32 -m mac --mac-source FA:16:3E:4E:7A:0E -g felix-p-_6b340324948a39b-o -A felix-from-7f470881-51 -m comment --comment "Anti-spoof DROP (endpoint 7f470881-5156-47ce-a67d-b971ef5e5cde):" -j DROP -A felix-p-_6b340324948a39b-i -p icmp -m set --match-set felix-v4-_6b340324948a39b src -j RETURN -A felix-p-_6b340324948a39b-i -s 172.18.203.20/32 -p tcp -m multiport --dports 22 -j RETURN -A felix-p-_6b340324948a39b-i -s 172.18.203.20/32 -p udp -m multiport --dports 5060 -j RETURN -A felix-p-_6b340324948a39b-i -s 172.18.203.20/32 -p tcp -m multiport --dports 80 -j RETURN -A felix-p-_6b340324948a39b-i -m comment --comment "Default DROP rule (72d696a9-f715-495f-9152-7f5e6a69fd0f):" -j DROP
What saves us? • conntrack • ipsets scale well, thanks to hash table implementation • Nested design for source/destination interface mapping
Arjan Schaaf’s measurements
What is happening here? • http://www.slideshare.net/ArjanSchaaf/docker-network-performance-in- the-public-cloud • Various approaches to networking between containers on AWS hosts • For this case Calico uses IP-in-IP between the hosts • Calico bandwidth less than half of native • We set up the same system, got same results as Arjan • For t2.micro bandwidth = 65.3 MB/sec compared with native = 125 MB/sec. • For m4.xlarge bandwidth = 108 MB/sec compared with native = 267 MB/sec • Why?
It’s all about the MTU • Calico in a public cloud uses IP-in-IP, with tunnel MTU = 1440 • 1440 was optimised for GCE, which has an MTU of 1460 on its VM interfaces • But AWS instances have an MTU = 9001! • So native tests were using jumbo frames, and the calico test was using 1440. • If Calico’s tunnel MTU is increased to 8980 • For t2.micro, Calico bandwidth = 114 MB/sec • For m4.xlarge, Calico bandwidth = 266 MB/sec • Problem solved – Calico throughput is now close to native
So what have we learned? • With Calico connectivity, VMs or containers can saturate a 10Gb link between hosts, just as much as the hosts themselves could • There is a CPU cost to virtualization • But mostly inevitable if you want virtualization at all (non-accelerated) • Calico does not add any significant extra cost • Conntrack largely saves us from the effects of complex iptables • ipsets and clever programming design also help • Be humble about performance comparisons
Further information, and thanks! • Project Calico • http://www.projectcalico.org/ • http://docs.projectcalico.org/en/latest/ • https://github.com/projectcalico • Blog on Calico data plane performance • http://www.projectcalico.org/calico-dataplane-performance/ • Thanks! • @neiljerram • @projectcalico • www.metaswitch.com
Recommend
More recommend