BPF Turning Linux into a Microservices-aware Operating System
About the Speaker Thomas Graf ● Linux kernel developer for ~15 years working on networking and security Helped write one of the biggest monoliths ever ● Worked on many Linux components over the years (IP, ● TCP, routing, netfilter/iptables, tc, Open vSwitch, …) ● Creator of Cilium to leverage BPF in a cloud native and microservices context Co-Founder & CTO of the company building Cilium ● 2
Agenda Evolution of running applications ● ○ From single task processes to microservices Problems of the Linux kernel ● ○ The kernel What is BPF? ● ○ Turning Linux into a modern, microservices-aware operating system Cilium - BPF-based networking security for microservices ● ○ What is Cilium? Use Cases & Deep Dive ○ ● Q&A 3
Evolution: Running applications Dark Age: Microservices Multi tasking Virtualization Single tasking Containers Ship the OS together Back to a shared Split the CPU and The simple age. with application and run memory. Shared operating system. it in a VM for better libraries, package Applications directly resource isolation. interact with the host management, Linux Virtualized hardware operating system again. distributions. and software defined infrastructure. 4
Problems of the Linux Kernel in the age of microservices 5
Problem #1: Abstractions Process Process The Linux kernel is split into layers to provide strong abstractions. System Call Interface Sockets Pros: TCP UDP Raw Strong userspace API compatibility ● guarantee. A 20 years old binary still Netfilter works. IPv4 IPv6 Majority of Linux source code is not ● hardware specific. Ethernet Traffic Shaping Cons: Netdevice / Drivers Every layer pays the cost of the ● HW Bridge OVS .. layers above and below. ● Very hard to bypass layers. 6
Problem #2: Per subsystem APIs brctl / Process Process seccomp iptables tc tcpdump ethtool ip ovsctl System Call Interface Sockets TCP UDP Raw Netfilter IPv4 IPv6 Ethernet Traffic Shaping Netdevice / Drivers HW Bridge OVS .. 7
Problem #3: Development Process The Good: The Bad: ● Open and transparent process ● Hard to change ● Excellent code quality ● Shouting is involved (getting better) Stability Large and complicated codebase ● ● Available everywhere Upstreaming code is hard, consensus has to ● ● ● Almost entirely vendor neutral be found. ● Upstreaming is time consuming Depending on the Linux distribution, ● merged code can take years to become generally available ● Everybody maintains forks with 100-1000s backports 8
Problem #4: What is a container? What the kernel knows about: What the kernel does not know: ● Processes & thread groups ● Containers or Kubernetes pods ● Cgroups ○ There is no container ID in the kernel Limits and accounting of CPU, Exposure requirements ○ ● memory, network, … Configured by The kernel no longer knows whether ○ container runtime. an application should be exposed ● Namespaces outside of the host or not. Isolation of process, CPU, mount, API calls made between containers/pods ○ ● user, network, IPC, cgroup, UTS Awareness stops at layer 4 (ports). ○ (hostname). Configured by container While SELinux can control IPC, it can’t ○ runtime control service to service API calls. IP addresses & port numbers Servicemesh, huh? ● ● Configured by container networking ○ ● System calls made & SELinux context ○ Optionally configured by container runtime 9
What now? Alternatives? Give user Rewrite Move OS to space access Unikernel Everything? Userspace to hardware Expose the hardware We don’t need kernel Total Estimated Cost Linus was wrong. The mode for most of the directly to user space. app should provide its to Develop Linux It will be fine. logic. Build on top of a own OS. (average salary = minimal Linux. $75,662.08 /year, overhead = 2.40). Examples: ClickOS, Examples : User mode Examples: DPDK, $1,372,340,206 Linux, gVisor, ... MirageOS, Rumprun, ... UDMA, .. 10
What is BPF? Highly efficient sandboxed virtual machine in the Linux kernel making the Linux kernel programmable at native execution speed. $ clang -target bpf -emit-llvm -S \ 32-bit-example.c $ llc -march=bpf 32-bit-example.ll Jointly maintained by Cilium $ cat 32-bit-example.s cal: and Facebook with r1 = *(u32 *)(r1 + 0) r2 = *(u32 *)(r2 + 0) collaborations from Google, r2 += r1 Red Hat, Netflix, Netronome, *(u32 *)(r3 + 0) = r2 exit and many others. 11
The Linux kernel is event driven Process Process Process Process System calls System Call Interface 12M lines of source code Drivers Interrupts CPU RAM MMU NIC Disk Disk USB 12
Run BPF program on event Attachment points Process Process ● Kernel functions (kprobes) connect() read() BPF ● Userspace functions (uprobe) BPF System calls ● Tracepoints ● TCP File Descriptor Sockets ● Network devices (packet level) retrans ● Sockets (data level) VFS TCP/IP Network device (DMA level) [XDP] ● Block Device Network Device ... ● BPF Send network IO Read packet BPF BPF NIC Disk 13
BPF Maps BPF map use cases: ● Hold program state Process ● Share state between programs Share state with user space ● Export metrics & statistics ● ● Configure programs BPF Map types: Hash tables ● Maps BPF ● Arrays ● LRU (Least recently used) Ring buffer ● Stack trace ● ● LPM (Longest prefix match) 14
BPF Helpers BPF helpers: bpf_get_prandom_u32() ● Stable kernel API exposed to BPF BPF programs to interact with the kernel bpf_skb_store_bytes() Includes ability to: ● bpf_redirect() Get process/cgroup context ○ ○ Manipulate network packets bpf_get_current_pid_tgid() and forwarding Access BPF maps bpf_perf_event_output() ○ Access socket data ○ ○ Send metrics to user space ○ ... 15
BPF Tail Calls BPF tail calls: ● Chain logical programs together BPF BPF ● Implement function calls Must be within same program type ● BPF BPF BPF 16
BPF JIT Compiler generic JIT Compiler Byte code ● Ensures native execution performance without requiring to understand CPU Compiles BPF bytecode to CPU ● architecture specific instruction set x86_64 generic Byte Byte code code JIT Supported architectures: ● X86_64, arm64, ppc64, s390x, mips64, sparc64, arm 17
BPF Contributors 380 Daniel Borkmann (Cilium, Maintainer) Top contributors of 161 Alexei Starovoitov (Facebook, Maintainer) the total 186 160 Jakub Kicinski Netronome contributors to BPF 110 John Fastabend (Cilium) from January 2016 to 96 Yonghong Song (Facebook) November 2018. 95 Martin KaFai Lau (Facebook) 94 Jesper Dangaard Brouer (Red Hat) 74 Quentin Monnet (Netronome) 45 Roman Gushchin (Facebook) 45 Andrey Ignatov (Facebook) 18
BPF Use Cases L3-L4 Load balancing Replacing iptables with BPF ● ● ● Network security (bpfilter) Traffic optimization NFV & Load balancing (XDP) ● ● ● Profiling ● Profiling & Tracing https://code.fb.com/open-s ource/linux/ ● Performance ● QoS & Traffic optimization Troubleshooting Network Security ● ● Tracing & Systems Monitoring ● Profiling Networking ● 19
Simple Kprobe Example Example: BPF program using gobpf/bcc: 20
What is Cilium? Cilium is open source software for transparently providing and securing the network and API connectivity between application services deployed using Linux container management platforms like Kubernetes, Docker, and Mesos. At the foundation of Cilium is the new Linux kernel technology BPF, which enables the dynamic insertion of powerful security, visibility, and networking control logic within Linux itself. Besides providing traditional network level security, the flexibility of BPF enables security on API and process level to secure communication within a container or pod. Read More 21
Project Goals Approachable BPF Security ● Make the efficiency and flexibility of BPF ● Use the additional visibility of BPF to available in an approachable way provide security for microservices Automate program creation and including: ● management ○ API awareness ● Provide an extendable platform ○ Identity based enforcement Process level context enforcement ○ Microservices-aware Linux Performance ● Use the flexibility of BPF to make the Linux ● Leverage the execution performance and kernel aware of cloud native concepts JIT compiler to provide a highly efficient such as containers and APIs. implementation. 22
Cilium Use Cases Container Networking Microservices Security ● Highly efficient and flexible ● Identity-based L3-L4 network security networking Accelerated API-aware security via ● CNI and CMM plugins Envoy (HTTP, gRPC, Kafka, Cassandra, ● ● IPv4, IPv6, NAT46, direct routing, memcached, ..) encapsulation ● DNS aware policies Multi cluster routing SSL data visibility via kTLS ● ● Servicemesh acceleration: Service Load balancing: ● Minimize overhead when injecting ● Highly scalable L3-L4 load balancing servicemesh sidecar proxies implementation Kubernetes service implementation or ● API driven. 23
Recommend
More recommend