Bringing Security and Multi- tenancy to Kubernetes Lei (Harry) - - PowerPoint PPT Presentation

bringing security and multi tenancy to kubernetes
SMART_READER_LITE
LIVE PREVIEW

Bringing Security and Multi- tenancy to Kubernetes Lei (Harry) - - PowerPoint PPT Presentation

Bringing Security and Multi- tenancy to Kubernetes Lei (Harry) Zhang About Me Lei (Harry) Zhang @resouer #CNCF member, #Microsoft MVP Previous: VMware, Baidu Feature Maintainer of Kubernetes HyperCrew: https://hyper.sh/


slide-1
SLIDE 1

Bringing Security and Multi- tenancy to Kubernetes

Lei (Harry) Zhang

slide-2
SLIDE 2

About Me

  • Lei (Harry) Zhang @resouer
  • #CNCF member, #Microsoft MVP
  • Previous: VMware, Baidu
  • Feature Maintainer of Kubernetes
  • HyperCrew: https://hyper.sh/
  • Publication: Docker & Kubernetes Under the Hood
  • Phd Candidate #Large-scale cluster scheduling and management
slide-3
SLIDE 3

A survey about “boundary”

  • Are you comfortable with Linux containers as an effective boundary?
  • Yes, I use containers in my private/safe environment
  • No, I use containers to serve the public cloud
slide-4
SLIDE 4

As long as we care security…

  • We have to wrap containers inside full-blown virtual machines
  • But we lose cloud-native deployment
  • Slow startup time
  • Huge resources wasting
  • Memory tax for every container

dream reality

slide-5
SLIDE 5

Revisit container

  • Container Runtime
  • The dynamic view and boundary of

your running process

  • Container Image
  • The static view of your program, data,

dependencies, files and directories

namespace cgroups

FROM busybox ADD temp.txt / VOLUME /data CMD [“echo hello"] Read-Write Layer & /data “echo hello”

read-only layer

/bin /dev /etc /home /lib / lib64 /media /mnt /opt /proc / root /run /sbin /sys /tmp / usr /var /data /temp.txt

/etc/hosts /etc/hostname /etc/resolv.conf

read-write layer

/ t e m p . t x t

json

json

init layer

FROM busybox ADD temp.txt / VOLUME /data CMD [“echo hello"]

Docker Container

slide-6
SLIDE 6

HyperContainer

Secure Kubernetes from runtime level

slide-7
SLIDE 7

HyperContainer

  • Container Runtime
  • RunV
  • https://github.com/hyperhq/runv
  • A OCI compatible hypervisor based runtime implementation
  • Control daemon
  • https://github.com/hyperhq/hyperd
  • Container Image
  • Docker Image Spec
slide-8
SLIDE 8

Combine the best parts

  • Portable and behaves like a Linux container
  • $ hyperctl run -t busybox echo helloworld
  • sub-second startup time*, ~12MB memory cost
  • Fully isolated sandbox with an independent guest kernel
  • $ hyperctl exec -t busybox uname -r
  • 4.4.12-hyper (or your provided kernel)
  • security, backward compatibility, maturity

See: http://hypercontainer.io/why-hyper.html

slide-9
SLIDE 9

HyperContainer is a Pod

  • That’s how HyperContainer fits into the Kubernetes philosophy
  • Wait, why Pod is so important?
slide-10
SLIDE 10

Pod: lesson learned from Borg

  • Should sample.war be packaged with Tomcat?
slide-11
SLIDE 11

Pod: lesson learned from Borg

  • InitContainers: one or more containers started in

sequence before the pod's normal containers are started.

  • Share volumes, perform network operations, and

perform computation prior to the app containers.

slide-12
SLIDE 12

So, Pod is

  • The group of super-affinity containers
  • The atomic scheduling unit
  • The process group in container cloud
  • Do right things
  • without modifying your container image
  • Kubernetes = Spring Framework
  • Pod = IoC

Pod log app infra container volume init container

slide-13
SLIDE 13

Pod is not easy to simulate

  • log super affinity app
  • Requirement:
  • app: 1G, log: 0.5G
  • Available:
  • Node_A: 1.25G, Node_B: 2G
  • What happens if app scheduled to Node_A?
slide-14
SLIDE 14

HyperContainer is a Pod

  • Linux container based runtimes
  • wraps and encapsulates several app containers into a logical group
  • Hypervisor container based runtime
  • hypervisor serves as a natural boundary of Pod
slide-15
SLIDE 15

HyperContainer is a Pod

  • Container Runtime Interface
  • create sandbox Foo --> create container C --> start container

C

  • stop container C --> remove container C --> delete sandbox

Foo

  • Sandbox
  • Normally: the infra container
  • HyperContainer: hypervisor
  • with HyperKernel
  • a HyperStart process as PID 1
  • setup mnt namespace, launch apps from the images etc
slide-16
SLIDE 16

Hypernetes

Kubernetes with HyperContainer Runtime

slide-17
SLIDE 17

Hypernetes

  • Also: h8s
  • Kubernetes + HyperContainer runtime
  • officially supported by using kubernetes/frakti
  • Multi-tenant network and persistent volumes
  • battle tested Neutron + Cinder plugin
slide-18
SLIDE 18

Multi-tenant Network

  • Goal:
  • leveraging tenant-aware neutron network for Kubernetes
  • following the network plugin workflow
  • Non-goal:
  • break k8s network model or hack k8s code
slide-19
SLIDE 19

Define the Network

  • Network
  • a top class api object
  • each tenant (created by Keystone) has its own Network
  • Network mapping to Neutron “net”
  • a Network Controller is responsible to manage Network lifecycle
slide-20
SLIDE 20

Example

kubelet

SyncLoop

controller-manager

ControlLoop

kubelet

SyncLoop

proxy proxy

network pod replica namespace service job deployment volume petset …

etcd

scheduler

api-server

Desired World Real World

Call Neutron to create/delete network

slide-21
SLIDE 21

Kubernetes Network Model

  • Container reach container
  • all containers can communicate with all other containers without NAT
  • Node reach container
  • all nodes can communicate with all containers (and vice-versa) without NAT
  • IP addressing
  • Pod in cluster can be addressed by its IP
slide-22
SLIDE 22

How h8s fits that?

  • Network can be assigned to one or more

Namespaces

  • Pods belonging to the same Network can

reach each other directly through IP

  • a Pod’s network mapping to Neutron “port”
  • kubelet network plugin is responsible for Pod

network setup

slide-23
SLIDE 23

Example

kubelet

SyncLoop

kubelet

SyncLoop

proxy proxy

1 Pod created

etcd

scheduler

api-server

slide-24
SLIDE 24

Example

kubelet

SyncLoop

kubelet

SyncLoop

proxy proxy

2 Pod object added

etcd

scheduler

api-server

slide-25
SLIDE 25

Example

kubelet

SyncLoop

kubelet

SyncLoop

proxy proxy

3.1 New pod object detected 3.2 Bind pod with node

etcd

scheduler

api-server

slide-26
SLIDE 26

Example

kubelet

SyncLoop

kubelet

SyncLoop

proxy proxy

4.1 Detected pod bind with me 4.2 Start containers in pod

etcd

scheduler

api-server

slide-27
SLIDE 27

Design of kubelet

InitNetworkPlugin Choose Runtime docker, rkt, hyper/remote InitNetworkPlugin

HandlePods {Add, Update, Remove, Delete, …}

NodeStatus Network Status status Manager PLEG

SyncLoop

Pod Update Worker (e.g.ADD)

  • generale Pod status
  • check volume status (talk later)
  • call runtime to start containers
  • set up Pod network (see next slide)

volume Manager

PodUpdate

image Manager

slide-28
SLIDE 28

Set Up Pod Network

slide-29
SLIDE 29

kubestack

A standalone gRPC daemon

  • 1. to “translate” the SetUpPod request to the Neutron network API
  • 2. handling multi-tenant Service proxy
slide-30
SLIDE 30

Service

$ iptables-save | grep my-service

  • A KUBE-SERVICES -d 10.0.0.116/32 -p tcp -m comment --comment "default/my-service: cluster IP" -m tcp --dport 8001 -j KUBE-SVC-KEAUNL7HVWWSEZA6
  • A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-6XXFWO3KTRMPKCHZ
  • A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-57KPRZ3JQVENLNBRZ
  • A KUBE-SEP-6XXFWO3KTRMPKCHZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.2:80
  • A KUBE-SEP-57KPRZ3JQVENLNBRZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.3:80

portal 10.10.0.116:8001 random mode rules backend rule_1 backend rule_2 172.17.0.2.:80 172.17.0.3.:80

OnServiceUpdate OnEndpointsUpdate

slide-31
SLIDE 31

Multi-tenant Service

  • Default iptables-based kube-proxy is not tenant aware
  • Endpoint Pods and Nodes with iptables rules are isolated into different

networks

  • Hypernetes uses a built-in HAproxy as the Service portal
  • to proxy all Service instances within same namespace
  • the same OnServiceUpdate and OnEndpointsUpdate process
  • ExternalProvider
  • a OpenStack LB will be created as Service
  • e.g. curl 58.215.33.98:8078
slide-32
SLIDE 32

Kubernetes Persistent Volume

Host path Cinder volume plugin Pod Pod

mountPath mountPath

attach mount

Volume Manager

desired World reconcile

  • Get mountedVolume from actualStateOfWorld
  • Unmount volumes in mountedVolume but not in

desiredStateOfWorld

  • AttachVolume() if vol in desiredStateOfWorld and not

attached

  • MountVolume() if vol in desiredStateOfWorld and not

in mountedVolume

  • Verify devices that should be detached/unmounted are

detached/unmounted

  • Tips:
  • 1. -v host:path
  • 2. attach VS mount
  • 3. Totally independent from container

management

slide-33
SLIDE 33

Persistent Volume with HyperContainer

  • Enhanced Cinder volume plugin
  • Linux container:
  • 1. full OpenStack cluster
  • 2. query Nova to find node
  • 3. attach Cinder volume to host path
  • 4. bind mount host path to Pod containers
  • HyperContainer:
  • directly attach block devices to Pod
  • thanks to the hypervisor based Pod boundary
  • eliminates extra time to query Nova

Host vol Enhanced Cinder volume plugin Pod Pod

mountPath mountPath

attach vol

desired World reconcile

Volume Manager

slide-34
SLIDE 34

PV Example

  • Create a Cinder volume
  • Claim volume by reference its

volumeID

slide-35
SLIDE 35

Container Runtime Interface

slide-36
SLIDE 36

Future of CRI

  • Keep Docker as the only one default container runtime
  • ocid, rktlet, hyperd
  • Frakti: the Remote Container Runtime Kit
  • https://github.com/kubernetes/frakti
  • welcome to tryout, star and fork
slide-37
SLIDE 37

“if image becomes non-standard”

  • e.g. Docker image becomes somehow Docker specific
  • Don’t worry, kubelet.imageManager is moving to runtime specific
  • but then k8s will probably choose
  • NO DEFAULT runtime
slide-38
SLIDE 38

Node Node

Full Topology

Node kubestack Neutron L2 Agent kube-proxy kubelet Cinder Plugin Pod Pod Pod Pod KeyStone Neutron Cinder Master Object: Network Ceph Object: Pod Object: …

slide-39
SLIDE 39

Summary

  • A new way to build secure and multi-tenant Kubernetes
  • Kubernetes + HyperContainer + Neutron Plugin + Cinder Plugin + Keystone
  • Roadmap
  • Graduate HyperContainer runtime on k8s upstream
  • Neutron CNI plugin
  • Project URL: https://github.com/hyperhq/hypernetes
  • Tip: https://hyper.sh is totally built on Hypernetes, try it out :)
slide-40
SLIDE 40

END

Lei (Harry) Zhang @resouer