Istio A modern service mesh Louis Ryan Principal Engineer @ Google @louiscryan
My Google Career HTTP HTTP HTTP2 GRPC Reverse Proxy Reverse Proxy Reverse Proxy HTTP Control HTTP2 GRPC API Proxy HTTP Plane Stubby API Proxy v2 GData Library Server Stubby GRPC (local) Server Server Centralization Performance & Isolation
Cloud → Internal & External Convergence HTTP2 GRPC ● Network distance & bandwidth Reverse Proxy ● Protocols Control HTTP2 GRPC ● Isolation & Reliability Plane Security Concerns ● API Proxy v2 Stubby GRPC (local) Server Sidecar!
Decoupling → Velocity ● Operators & Developers ● Code & Networking ● Network Topology & Security Modernization & Architecture ●
What is a ‘Service Mesh’ ? A network for services, not bytes ● Observability FREE! Resiliency ● Traffic Control ● ● Security ● Policy Enforcement ● Zero code change
What is a ‘Service Mesh’ ? A network for services , not bytes ● Observability Resiliency ● Traffic Control ● ● Security ● Policy Enforcement
Weaving the mesh - Sidecars HTTP/1.1, HTTP/2, gRPC, TCP with or without TLS HTTP/1.1, HTTP/2, gRPC, TCP with or without TLS External sidecar sidecar proxy Services Internet svcA svcB Service A Service B Outbound features: Inbound features: Service authentication Service authentication ❖ ❖ Load balancing Authorization ❖ ❖ Retry and circuit breaker Rate limits ❖ ❖ Fine-grained routing Load shedding ❖ ❖ Telemetry Telemetry ❖ ❖ Request Tracing Request Tracing ❖ ❖ Fault Injection Fault Injection ❖ ❖
Istio - Putting it all together Control Plane API Pilot Mixer Istio-Auth Control flow during Discovery & Config request processing T L S c e r t s data to Envoys t o E n v o y P o l i c y c h e c k s , t e l e m e t r y Pod Envoy Envoy Traffic is transparently svcA svcB intercepted and proxied. App is unaware of Envoy’s presence Service A Service B
Our sidecar of choice - Envoy ● A C++ based L4/L7 proxy ● Low memory footprint Goodies: API driven config updates → no reloads ❖ ● Battle-tested @ Lyft Zone-aware load balancing w/ failover ❖ Traffic routing and splitting ❖ 100+ services ○ Health checks, circuit breakers, timeouts, retry ❖ ○ 10,000+ VMs budgets, fault injection, … HTTP/2 & gRPC ❖ ○ 2M req/s Transparent proxying ❖ Designed for observability ❖ Plus an awesome team willing to work with the community!
Modeling the Service Mesh 1. Environment-specific topology Kubernetes Custom Consul Eureka extraction 2. Topology is mapped to a Platform Adapter platform-agnostic model. Rules API Pilot Abstract Model 3. Additional rules are layered onto Envoy API the model. E.g. retries, traffic Service discovery splits etc. & traffic rules 4. Configuration is pushed to Envoy Envoy Envoy Envoy Envoy and applied without restarts
What is a ‘Service Mesh’ ? A network for services, not bytes ● Observability Resiliency ● Traffic Control ● ● Security ● Policy Enforcement
Visibility Monitoring & tracing should not be an afterthought in the infrastructure Goals Istio - Grafana dashboard w/ Prometheus backend ● Metrics without instrumenting apps Consistent metrics across fleet ● ● Trace flow of requests across services ● Portable across metric backend providers Istio Zipkin tracing dashboard
Visibility: Metrics Requests Requests Envoy Service Responses Responses Traces ● Mixer collects metrics Zipkin emitted by Envoys Report([ ]attributes) Check(attributes) Adapters in the Mixer ● normalize and forward Mixer Operator Supplied to monitoring Adapters Config backends Metrics backend can ● Backends Prometheus Stackdriver Statsd be swapped at runtime Weave ServiceGraph GUIs Grafana Example Scope
Visibility: Tracing Requests Requests Envoy Service Responses Responses Traces Applications do not have to ● Zipkin deal with generating spans Report([ ]attributes) Check(attributes) or correlating causality Envoys generate spans ● Mixer Operator ○ Applications need to Supplied Adapters Config forward context headers on outbound calls ● Envoy sends traces to Backends Prometheus Stackdriver Statsd Mixer ● Adapters at Mixer send traces to respective backends Weave ServiceGraph GUIs Grafana Example Scope
What is a ‘Service Mesh’ ? A network for services, not bytes ● Observability Resiliency ● Traffic Control ● ● Security ● Control
Resiliency Resilience features Istio adds fault tolerance to your application Timeouts ❖ without any changes to code Retries with timeout budget ❖ Circuit breakers ❖ // Circuit breakers Health checks ❖ AZ-aware load balancing w/ ❖ destination : serviceB.example.cluster.local policy : automatic failover - tags: Control connection pool size and ❖ version: v1 circuitBreaker: request load simpleCb: Systematic fault injection ❖ maxConnections: 100 httpMaxRequests: 1000 httpMaxRequestsPerConnection: 10 httpConsecutiveErrors: 7 sleepWindow: 15m httpDetectionInterval: 5m
What is a ‘Service Mesh’ ? A network for services, not bytes ● Observability Resiliency & Efficiency ● Traffic Control ● ● Security ● Policy Enforcement
Traffic Splitting serviceB.example.cluster.local Rules API Pod Labels: // A simple traffic splitting rule version: v1.5 Pilot env: us-prod destination : serviceB.example.cluster.local Traffic routing match : rules Envoy source: serviceA.example.cluster.local 99% route : Pod - tags: Envoy version: v1.5 svcB Service B env: us-prod weight: 99 1% http://serviceB.example - tags: svcA Envoy version: v2.0-alpha env: us-staging Service A weight: 1 svcB Traffic control is decoupled from infrastructure scaling Pod Labels: version: v2.0-alpha, env:us-staging
Traffic Steering Service B version: v1 Pod 1 Pod 2 // Content-based traffic steering rule Pod 3 svcA svcB Service A destination : serviceB.example.cluster.local match : httpHeaders: user-agent: Service B regex: ^(.*?;)?(iPhone)(;.*)?$ version: v1 precedence : 2 Pod 1 User-agent: *Android* route : Pod 2 Pod 3 - tags: svcB version: canary svcA User-agent: *iPhone* Service A Content-based traffic steering Pod 4 svcB’ version: canary
What is a ‘Service Mesh’ ? A network for services, not bytes ● Observability Resiliency & Efficiency ● Traffic Control ● ● Security ● Policy Enforcement
Securing Services Encryption by default ● ● Verifiable identity Secure naming / addressing ● ● Revocation
Problem: Strong Service Security at Scale Concerns Wants ● Insiders ● Workload mobility ● Hijacked services ● Remote admin & development Microservice attack surface Shared & 3rd party services ● ● Workload mobility User & Service identity ● ● ● Brittle fine-grained models ● Lower costs ● Securing resources not just endpoints Audit & Compliance ● Traditional perimeter security models are insufficient
Istio - Security at Scale spiffe.io
What is a ‘Service Mesh’ ? A network for services, not bytes ● Observability Resiliency & Efficiency ● Traffic Control ● ● Security ● Policy Enforcement
Putting it all together Control Plane API Control flow during Pilot Mixer Istio-Auth request processing Discovery & Config T L S c e r t s data to Envoys t o E n v o y P o l i c y c h e c k s , t e l e m e t r y Pod Envoy Envoy svcA svcB Service A Service B
What’s Mixer For? ● Nexus for policy evaluation and telemetry reporting Precondition checking ○ Quotas & Rate Limiting ○ ● Primary point of extensibility ● Enabler for platform mobility ● Operator-focused configuration model
Attributes - The behavioral vocabulary target.service = “playlist.svc.cluster.local” request.size = 345 request.time = 2017-04-12T12:34:56Z source.ip = 192.168.10.1 source.name = “music-fe.serving.cluster.local” source.user = “admin@musicstore.cluster.local” api.operation = “GetPlaylist”
Roadmap ● Production Readiness Multi-Cloud & Multi-Environment ● ● Networking - Extension models, UDP, QUIC, performance, ... ● Moar integrations - ACLs, Telemetry, Audit, Policy, .... ● Security - HSM, Cert & Key stores, federation, ... API Management ●
Thanks! Phew
Recommend
More recommend