Networking approaches in a Container World
Who we are Flavio Castelli Neil Jerram Antoni Segura Puimedon Engineering Manager Senior Sw. Engineer Principal Sw. Engineer
Disclaimer ● There a many container engines, we’ll focus on Docker ● Multiple networking solutions are available: Introduce the core concepts ○ ○ Many projects → cover only some of them Container orchestration engines: ● ○ Often coupled with networking ○ Focus on Docker Swarm and Kubernetes Remember: the container ecosystem moves at a fast pace, things can ● suddenly change
The problem ● Containers are lightweight ● Containers are great for microservices Microservices: multiple distributed processes communicating ● Lots of containers that need to be connected together ●
Single host
host networking Containers have full access to the host interfaces!!! container-a host lo eth0 ...
host networking $ docker run --net=host -it --rm alpine /bin/sh Containers able to: / # ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state See all host interfaces ● UNKNOWN qlen 1 ● Use all host interfaces link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 Containers can’t (without CAPS) 2: wlp4s0 : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 Modify their IP addresses ● link/ether e4:b3:18:d2:f6:ea brd ff:ff:ff:ff:ff:ff ● Modify their IP routes 3: enp0s31f6 : <BROADCAST,MULTICAST> mtu 1500 qdisc noop state Create virtual devices ● DOWN qlen 1000 ● Interact with iptables/ebtables link/ether c8:5b:76:36:b6:0b brd ff:ff:ff:ff:ff:ff 4: docker0 : <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN link/ether 02:42:7e:62:3d:37 brd ff:ff:ff:ff:ff:ff / #
Bridged networking ● Linux bridge ● Containers connected to the container-a container-b bridge with veth pairs ● Each container gets its own IP veth1 veth3 and kernel networking namespace ● Containers can talk to each veth0 veth2 other and to the host via IP docker0 172.17.0.0/16 Forwarding host eth0
$ ip address show dev docker0 4: docker0 : <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 Bridged networking qdisc noqueue state DOWN group default link/ether 02:42:7e:62:3d:37 brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 scope global docker0 valid_lft forever preferred_lft forever ● Outwards connectivity via IP $ sudo iptables -t nat -L POSTROUTING forwarding and masquerading Chain POSTROUTING (policy ACCEPT) ● The bridge and containers use a target prot opt source destination private subnet MASQUERADE all -- 172.17.0.0/16 anywhere $ docker run --net=bridge -it --rm alpine /bin/sh -c '/sbin/ip -4 address show dev eth0; ip -4 route show' 50: eth0@if51: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP inet 172.17.0.2 /16 scope global eth0 valid_lft forever preferred_lft forever default via 172.17.0.1 dev eth0 172.17.0.0/16 dev eth0 src 172.17.0.2
$ docker run --net=bridge -d --name nginx -p 8000:80 nginx $ sudo iptables -t nat -n -L Chain PREROUTING (policy ACCEPT) Bridged networking target prot opt source destination DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL ● Services are exposed with Chain OUTPUT (policy ACCEPT) iptables DNAT rules target prot opt source destination ● Iptables performance DOCKER all -- 0.0.0.0/0 !127.0.0.0/8 deteriorates as rule amount ADDRTYPE match dst-type LOCAL increases ● Limited to how many host ports Chain POSTROUTING (policy ACCEPT) are free to be bound target prot opt source destination MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0 MASQUERADE tcp -- 172.17.0.2 172.17.0.2 tcp dpt:80 Chain DOCKER (2 references) target prot opt source destination RETURN all -- 0.0.0.0/0 0.0.0.0/0 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:8000 to:172.17.0.2:80
Multi host
Multi host networking scenarios frontend container-0 container-0 network 1 2 container-0 container-0 application 3 4 network container-0 container-0 database 5 6 network host-A host-B host-C eth0 eth0 eth0
Multi host networking scenarios frontend container-0 container-0 network 1 2 container-0 container-0 application 3 4 network container-0 container-0 database 5 6 network VM-1 VM-2 VM-3 a big host-A
Multi host routing solutions
container-a container-b 10.0.8.2/24 10.0.8.3/24 Routing approach docker0 10.0.8.1/24 ● Managed common IP space at the container level host-a ● Assigns /24 subnet to each 172.16.0.4/16 eth0 host ● Inserts routes to each host /24 172.16.0.0/16 into the routing table of each host eth0 ● Main implementations 172.16.0.5/16 host-b ○ Calico ○ Flannel ○ Romana docker0 10.0.9.1/24 ○ Kuryr ■ Calico container-c container-d 10.0.9.2/24 10.0.9.3/24
container-a container-b 10.0.8.2/24 10.0.8.3/24 Calico’s approach docker0 10.0.8.1/24 ● Felix agent agent per node that sets up a vRouter: host-a ○ Kernel’s L3 forwarding 172.16.0.4/16 BGP vRouter eth0 ○ Handles ACLs with iptables 172.16.0.0/16 ○ Uses BIRD’s BGP to keep /32 or /128 routes to eth0 each container updated 172.16.0.5/16 BGP vRouter host-b ○ Etcd as data store ○ Replies container ARP reqs with host hwaddr docker0 10.0.9.1/24 container-c container-d 10.0.9.2/24 10.0.9.3/24
Flannel approach ● Flanneld agent ○ Etcd as data store ○ Keeps /24 routes to hosts up to date ○ No ACLs/isolation
Canal ● Developed by Tigera ● Announced on May 9th 2016
Multi host overlay solutions
container-a container-b container-c 10.0.8.2/24 10.0.8.3/24 10.0.7.2/24 Overlay approach net-x net-y 10.0.8.1/24 10.0.7.1/24 ● Encapsulates multiple networks encapsulation over the physical networking host-a ○ UDP 172.16.0.4/16 eth0 ○ vxlan encapsulated container traffic ○ geneve 172.16.0.0/16 ○ GRE ● Connect containers to virtual eth0 networks 172.16.0.5/16 host-b encapsulation ● Main projects ○ Docker’s native overlay ○ Flannel net y 10.0.9.1/24 ○ Weave ○ Kuryr ■ OVS (OVN, Dragonflow) container-c container-d ■ MidoNet 10.0.7.4/24 10.0.7.3/24 ■ PLUMgrid
OpenStack & containers with Kuryr ● Allows you to have both VMs, containers and containers-in-VMs in the same Overlay overlay ● Allows reusing VM nets for containers and viceversa underlay ● Allows you to have separate overlay nets routed to each other ● Isolation from the host networking ● Can have Swarm and Kubernetes on the same overlay
Routing vs Overlay Good Bad Routing ● Native performance ● Requires control over the ● Easy debugging infrastructure ● Hybrid cloud more complicated (requires VPN) ● Can run out of addresses (mitigation: IPv6) ● Easier inter-cloud ● Inferior performances (mitigation: Overlay ● Easier hybrid workloads hw acceleration and jumbo ● Doesn’t require control over the frame) infrastructure ● Debugging more complicated ● More implementation choice
Competing COE-Networking interaction Container Network Model (CNM) Container Network Interface (CNI) ● Implemented by Docker’s Libnetwork ● Implemented by Kubernetes, rkt, Mesos, ● Separated IPAM and Remote Drivers Cloud Foundry and Kurma ● Docker ≥ 1.12 Swarm mode only works ● Plugins: Calico with native overlay driver ○ ○ Flannel ● Some of the Libnetwork remote drivers: Weave ○ OpenStack Kuryr ○ ○ OpenStack Kuryr (unreleased) ○ Calico Weave ○
More challenges
Service discovery ● Producer : A container that runs a service ● Consumer : A container when consuming a service Need a way for consumers to find producer endpoints ●
Service discovery challenges #1 Finding the producer #2 Moving services Where is redis? web-01 redis-01 web-01 redis-01 redis-02 host-A host-B host-A host-B host-C eth0 eth0 eth0 eth0 eth0 Lacks SD
Service discovery challenges #3 Multiple choice Which redis? web-01 redis-01 redis-02 redis-03 host-A host-B host-C host-D eth0 eth0 eth0 eth0
Addressing service discovery
Use DNS ● Problematic for highly dynamic deployments: ○ Containers can die/be moved somewhere more often than DNS caches expire ○ If we try to improve it by reducing DNS TTL → more load on the server ○ Some clients ignore TTL → old entries are cached Note well: ● Docker < 1.11: updates /etc/hosts dynamically Docker ≥1.11: integrates a DNS server ●
Key-value store ● Rely on a k/v store ○ etcd, ○ consul, ○ zookeeper Producer register its IP and port ● ● Orchestration engine handles this data to the consumer ● At run time either: Change your application to read data straight from the k/v ○ ○ Rely on some helper that exposes the values via environment file or configuration file
Changes, multiple choices & ingress traffic
Recommend
More recommend