monitoring containers with bpf
play

Monitoring Containers with BPF Jonathan Perry, Flowmill Agenda / - PowerPoint PPT Presentation

Monitoring Containers with BPF Jonathan Perry, Flowmill Agenda / Claims Agenda / Claims 1. Visibility into connections between services facilitates SRE/DevOps. Agenda / Claims 1. Visibility into connections between services facilitates


  1. 3. It is easy to navigate large deployments by looking at neighborhoods. Even small deployments can have 
 complex connectivity

  2. 3. It is easy to navigate large deployments by looking at neighborhoods. Even small deployments can have 
 complex connectivity

  3. 3. It is easy to navigate large deployments by looking at neighborhoods. Even small deployments can have 
 complex connectivity

  4. 3. It is easy to navigate large deployments by looking at neighborhoods. Neighborhood: 
 All the services up to N hops from the selection

  5. 3. It is easy to navigate large deployments by looking at neighborhoods. Neighborhood: 
 All the services up to N hops from the selection 1. Search

  6. 3. It is easy to navigate large deployments by looking at neighborhoods. Neighborhood: 
 All the services up to N hops from the selection 1. Search 2. Detected 
 anomalies

  7. 3. It is easy to navigate large deployments by looking at neighborhoods. Neighborhood: 
 All the services up to N hops from the selection 1. Search 2. Detected 
 anomalies 3. Alerts 
 (Slack/PagerDuty etc.)

  8. 4. Connection visibility can point to failure domains: version, instance, zone. Got an alert / anomaly. Now what? Common causes: • New version deploy • Overloaded / borked instance • Geo / zone failure Or, helpful to know if failure concentrated on single • Container spec • Process • Port

  9. Agenda / Claims

  10. Agenda / Claims 1. Visibility into connections between services facilitates SRE/DevOps.

  11. Agenda / Claims 1. Visibility into connections between services facilitates SRE/DevOps. 2. Effective triage requires visibility into how network infrastructure affects services.

  12. Agenda / Claims 1. Visibility into connections between services facilitates SRE/DevOps. 2. Effective triage requires visibility into how network infrastructure affects services. 3. It is easy to navigate large deployments by looking at neighborhoods .

  13. Agenda / Claims 1. Visibility into connections between services facilitates SRE/DevOps. 2. Effective triage requires visibility into how network infrastructure affects services. 3. It is easy to navigate large deployments by looking at neighborhoods . 4. Connection visibility can point to failure domains : version, instance, zone.

  14. Agenda / Claims 1. Visibility into connections between services facilitates SRE/DevOps. 2. Effective triage requires visibility into how network infrastructure affects services. 3. It is easy to navigate large deployments by looking at neighborhoods . 4. Connection visibility can point to failure domains : version, instance, zone. 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility.

  15. Agenda / Claims 1. Visibility into connections between services facilitates SRE/DevOps. 2. Effective triage requires visibility into how network infrastructure affects services. 3. It is easy to navigate large deployments by looking at neighborhoods . 4. Connection visibility can point to failure domains : version, instance, zone. 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility. 6. Linux CLI provides great visibility without per-application changes.

  16. Agenda / Claims 1. Visibility into connections between services facilitates SRE/DevOps. 2. Effective triage requires visibility into how network infrastructure affects services. 3. It is easy to navigate large deployments by looking at neighborhoods . 4. Connection visibility can point to failure domains : version, instance, zone. 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility. 6. Linux CLI provides great visibility without per-application changes. 7. BPF enables 1-second granularity, low-overhead, full coverage of connections.

  17. Agenda / Claims 1. Visibility into connections between services facilitates SRE/DevOps. 2. Effective triage requires visibility into how network infrastructure affects services. 3. It is easy to navigate large deployments by looking at neighborhoods . 4. Connection visibility can point to failure domains : version, instance, zone. 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility. 6. Linux CLI provides great visibility without per-application changes. 7. BPF enables 1-second granularity, low-overhead, full coverage of connections. 8. BPF can handle encrypted connections (with uprobes)

  18. 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility. Logs, metrics, tracing, service meshes Each is great for its own use case! • Logs: low barrier, app internals • Metrics: dashboards on internals & business metrics • Tracing: cross-service examples of bad cases • Service mesh: aggregated connectivity, security, circuit breaking But…

  19. 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility. Cons: • Engineering time: requires per-service work (and maintenance) • Performance and cost • No infra visibility (drops, RTT) • Logs+Metrics: service centric, not connection • Tracing: sampling, cost

  20. 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility. Service mesh caveats Service Service Envoy Envoy cluster cluster HTTP HTTP conn mgr conn mgr cluster

  21. 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility. Service mesh caveats misconfigured mesh → broken telemetry. ● ○ want telemetry from a different source to debug the mesh Service Service Envoy Envoy cluster cluster HTTP HTTP conn mgr conn mgr cluster

  22. 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility. Service mesh caveats misconfigured mesh → broken telemetry. ● ○ want telemetry from a different source to debug the mesh ● partial deployments & managed services Service Service Envoy Envoy cluster cluster HTTP HTTP conn mgr conn mgr cluster

  23. 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility. Service mesh caveats misconfigured mesh → broken telemetry. ● ○ want telemetry from a different source to debug the mesh ● partial deployments & managed services ● no transport layer data (packet drops, RTT) Service Service Envoy Envoy cluster cluster HTTP HTTP conn mgr conn mgr cluster

  24. 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility. Service mesh caveats misconfigured mesh → broken telemetry. ● ○ want telemetry from a different source to debug the mesh ● partial deployments & managed services ● no transport layer data (packet drops, RTT) ● doesn’t solve the analysis part. Data is either ○ too aggregated - missing info on failure domains (version, zone, node) ○ too detailed (access logs) - still need to process 100k+ events/sec Service Service Envoy Envoy cluster cluster HTTP HTTP conn mgr conn mgr cluster

  25. 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility. Service mesh caveats misconfigured mesh → broken telemetry. ● ○ want telemetry from a different source to debug the mesh ● partial deployments & managed services ● no transport layer data (packet drops, RTT) ● doesn’t solve the analysis part. Data is either ○ too aggregated - missing info on failure domains (version, zone, node) ○ too detailed (access logs) - still need to process 100k+ events/sec Service Service Envoy Envoy cluster cluster HTTP HTTP conn mgr conn mgr cluster

  26. 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility. Service mesh caveats misconfigured mesh → broken telemetry. ● ○ want telemetry from a different source to debug the mesh ● partial deployments & managed services ● no transport layer data (packet drops, RTT) ● doesn’t solve the analysis part. Data is either ○ too aggregated - missing info on failure domains (version, zone, node) ○ too detailed (access logs) - still need to process 100k+ events/sec Service Service Envoy Envoy cluster cluster HTTP HTTP conn mgr conn mgr cluster

  27. 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility. Service mesh caveats misconfigured mesh → broken telemetry. ● ○ want telemetry from a different source to debug the mesh ● partial deployments & managed services ● no transport layer data (packet drops, RTT) ● doesn’t solve the analysis part. Data is either ○ too aggregated - missing info on failure domains (version, zone, node) ○ too detailed (access logs) - still need to process 100k+ events/sec Service Service Envoy Envoy cluster cluster HTTP HTTP conn mgr conn mgr cluster

  28. 5. Logging, metrics, tracing, service meshes are not ideal for connection visibility. Service mesh caveats misconfigured mesh → broken telemetry. ● ○ want telemetry from a different source to debug the mesh ● partial deployments & managed services ● no transport layer data (packet drops, RTT) ● doesn’t solve the analysis part. Data is either ○ too aggregated - missing info on failure domains (version, zone, node) ○ too detailed (access logs) - still need to process 100k+ events/sec ● eBPF user probes can efficiently get data from mesh and transport layer Service Service Envoy Envoy cluster cluster HTTP HTTP conn mgr conn mgr cluster

  29. 6. Linux CLI provides great visibility without per-application changes. Socket: Timestamp Source Destination Ports Bytes Drops RTT 1418530010 172.31.16.139 172.31.16.21 20641 22 4249 2 4 ms Protocol: Method Endpoint Code GET checkout?q=hrz4N 200

  30. 6. Linux CLI provides great visibility without per-application changes. Socket: Timestamp Source Destination Ports Bytes Drops RTT 1418530010 172.31.16.139 172.31.16.21 20641 22 4249 2 4 ms Protocol: Method Endpoint Code GET checkout?q=hrz4N 200 K8s: Tag IP Pod Zone Image 172.31.16.139 frontend frontend-image v1.16 us-west-1c 172.31.16.21 checkoutservice checkout-image v2.12a us-west-1a

  31. 6. Linux CLI provides great visibility without per-application changes. Socket: Timestamp Source Destination Ports Bytes Drops RTT 1418530010 172.31.16.139 172.31.16.21 20641 22 4249 2 4 ms Protocol: Method Endpoint Code GET checkout?q=hrz4N 200 K8s: Tag IP Pod Zone Image 172.31.16.139 frontend frontend-image v1.16 us-west-1c 172.31.16.21 checkoutservice checkout-image v2.12a us-west-1a Joined : Timestamp Source Destination Ports Bytes Drops RTT 1418530010 frontend checkout 20641 22 4249 2 4 ms frontend-image checkout-image v1.16 v2.12a Method Endpoint Code us-west-1c us-west-1a GET checkout?q=hrz4N 200

  32. 6. Linux CLI provides great visibility without per-application changes. Getting Flow Data A X B

  33. 6. Linux CLI provides great visibility without per-application changes. Getting Flow Data A X iptables B

  34. 6. Linux CLI provides great visibility without per-application changes. Getting Flow Data A A → X X iptables B

  35. 6. Linux CLI provides great visibility without per-application changes. Getting Flow Data A A → X (A,X) ~ X (A,B) iptables B

  36. 6. Linux CLI provides great visibility without per-application changes. Getting Flow Data A A → X (A,X) ~ X (A,B) iptables A → B B

  37. 6. Linux CLI provides great visibility without per-application changes. Getting Flow Data $ kubectl describe pod $POD Name: A Namespace: staging ... A Status: Running IP: 100.101.198.137 A → X Controlled By: ReplicaSet/A (A,X) ~ X (A,B) iptables A → B B

  38. 6. Linux CLI provides great visibility without per-application changes. Getting Flow Data $ kubectl describe pod $POD Name: A Namespace: staging ... A Status: Running IP: 100.101.198.137 A → X Controlled By: ReplicaSet/A (A,X) ~ X (A,B) iptables A → B B

  39. 6. Linux CLI provides great visibility without per-application changes. Getting Flow Data $ kubectl describe pod $POD Name: A Namespace: staging ... A Status: Running IP: 100.101.198.137 A → X Controlled By: ReplicaSet/A # PID=`docker inspect -f '{{.State.Pid}}' $CONTAINER` \ (A,X) nsenter -t $PID -n ss -ti ~ ESTAB 0 0 100.101.198.137:34940 100.65.61.118:8000 X (A,B) cubic wscale:9,9 rto:204 rtt:0.003/0 mss:1448 iptables cwnd:19 ssthresh:19 bytes_acked:2525112 segs_out:15664 segs_in:15578 data_segs_out:15662 send 73365.3Mbps lastsnd:384 A → B lastrcv:10265960 lastack:384 rcv_space:29200 minrtt:0.002 B

  40. 6. Linux CLI provides great visibility without per-application changes. Getting Flow Data $ kubectl describe pod $POD Name: A Namespace: staging ... A Status: Running IP: 100.101.198.137 A → X Controlled By: ReplicaSet/A # PID=`docker inspect -f '{{.State.Pid}}' $CONTAINER` \ (A,X) nsenter -t $PID -n ss -ti A X ~ ESTAB 0 0 100.101.198.137:34940 100.65.61.118:8000 X (A,B) cubic wscale:9,9 rto:204 rtt:0.003/0 mss:1448 iptables cwnd:19 ssthresh:19 bytes_acked:2525112 segs_out:15664 segs_in:15578 data_segs_out:15662 send 73365.3Mbps lastsnd:384 A → B lastrcv:10265960 lastack:384 rcv_space:29200 minrtt:0.002 B

  41. 6. Linux CLI provides great visibility without per-application changes. Getting Flow Data $ kubectl describe pod $POD Name: A Namespace: staging ... A Status: Running IP: 100.101.198.137 A → X Controlled By: ReplicaSet/A # PID=`docker inspect -f '{{.State.Pid}}' $CONTAINER` \ (A,X) nsenter -t $PID -n ss -ti A X ~ ESTAB 0 0 100.101.198.137:34940 100.65.61.118:8000 X (A,B) cubic wscale:9,9 rto:204 rtt:0.003/0 mss:1448 iptables cwnd:19 ssthresh:19 bytes_acked:2525112 segs_out:15664 segs_in:15578 data_segs_out:15662 send 73365.3Mbps lastsnd:384 A → B lastrcv:10265960 lastack:384 rcv_space:29200 minrtt:0.002 # conntrack -L tcp 6 86399 ESTABLISHED src=100.101.198.137 dst=100.65.61.118 sport=34940 dport=8000 B src=100.101.198.147 dst=100.101.198.137 sport=8000 dport=34940 [ASSURED] mark=0 use=1

  42. 6. Linux CLI provides great visibility without per-application changes. Getting Flow Data $ kubectl describe pod $POD Name: A Namespace: staging ... A Status: Running IP: 100.101.198.137 A → X Controlled By: ReplicaSet/A # PID=`docker inspect -f '{{.State.Pid}}' $CONTAINER` \ (A,X) nsenter -t $PID -n ss -ti A X ~ ESTAB 0 0 100.101.198.137:34940 100.65.61.118:8000 X (A,B) cubic wscale:9,9 rto:204 rtt:0.003/0 mss:1448 iptables cwnd:19 ssthresh:19 bytes_acked:2525112 segs_out:15664 segs_in:15578 data_segs_out:15662 send 73365.3Mbps lastsnd:384 A → B lastrcv:10265960 lastack:384 rcv_space:29200 minrtt:0.002 # conntrack -L tcp 6 86399 ESTABLISHED src=100.101.198.137 A X dst=100.65.61.118 sport=34940 dport=8000 B src=100.101.198.147 dst=100.101.198.137 sport=8000 B A dport=34940 [ASSURED] mark=0 use=1

  43. 6. Linux CLI provides great visibility without per-application changes. CLI tools have disadvantages • Performance: ○ iterates over all sockets ○ built for CLI use (printfs)

  44. 6. Linux CLI provides great visibility without per-application changes. CLI tools have disadvantages • Performance: ○ iterates over all sockets ○ built for CLI use (printfs) • Coverage : Linux CLI tools are polling based

  45. 6. Linux CLI provides great visibility without per-application changes. CLI tools have disadvantages • Performance: ○ iterates over all sockets ○ built for CLI use (printfs) • Coverage : Linux CLI tools are polling based poll poll poll poll time

  46. 6. Linux CLI provides great visibility without per-application changes. CLI tools have disadvantages • Performance: ○ iterates over all sockets ○ built for CLI use (printfs) • Coverage : Linux CLI tools are polling based poll poll socket poll poll time

  47. 6. Linux CLI provides great visibility without per-application changes. CLI tools have disadvantages • Performance: ○ iterates over all sockets ○ built for CLI use (printfs) • Coverage : Linux CLI tools are polling based poll poll socket poll poll time → Misses events between polls

  48. Enter eBPF • Linux bpf() system call since 3.18 • Run code on kernel events • Only changes, more data • Safe: In-kernel verifier, read-only • Fast: JIT-compiled Unofficial BPF mascot by Deirdré Straughan

  49. Enter eBPF • Linux bpf() system call since 3.18 • Run code on kernel events • Only changes, more data • Safe: In-kernel verifier, read-only • Fast: JIT-compiled Unofficial BPF mascot by Deirdré Straughan → 100% coverage + no app changes + low overhead ftw!

  50. 7. BPF enables 1-second granularity, low-overhead, full coverage of connections. Using eBPF tcptop: ● instruments tcp_sendmsg and tcp_cleanup_rbuf

  51. 7. BPF enables 1-second granularity, low-overhead, full coverage of connections. Using eBPF tcptop: ● instruments tcp_sendmsg and tcp_cleanup_rbuf

  52. 7. BPF enables 1-second granularity, low-overhead, full coverage of connections. Using eBPF tcptop: ● instruments tcp_sendmsg and tcp_cleanup_rbuf ● need to be careful of races: # IPv4: build dict of all seen keys ipv4_throughput = defaultdict(lambda: [0, 0]) for k, v in ipv4_send_bytes.items(): key = get_ipv4_session_key(k) ipv4_throughput[key][0] = v.value ipv4_send_bytes.clear() as for loop is running, kernel continues with updates, clear() throws those out.

  53. 7. BPF enables 1-second granularity, low-overhead, full coverage of connections. Using eBPF tcptop: ● instruments tcp_sendmsg and tcp_cleanup_rbuf ● need to be careful of races: # IPv4: build dict of all seen keys ipv4_throughput = defaultdict(lambda: [0, 0]) for k, v in ipv4_send_bytes.items(): key = get_ipv4_session_key(k) ipv4_throughput[key][0] = v.value ipv4_send_bytes.clear() as for loop is running, kernel continues with updates, clear() throws those out.

  54. 7. BPF enables 1-second granularity, low-overhead, full coverage of connections. System architecture Flow Collection Kubernetes Agent ECS Docker Linux Containers 
 Processes 
 Socket 
 NAT

Recommend


More recommend