RabbitMQ or Qpid Dispatch Router: Pushing Ali Sanhaji OpenStack to Javier Rojas Balderrama Matthieu Simonin the Edge OpenStack Summit | Berlin 2018
Who’s here Ali Sanhaji Javier Rojas Balderrama Research engineer at Orange, France Research engineer at Inria, France Matthieu Simonin* Research engineer at Inria France
Agenda - Bringing OpenStack to the edge - RabbitMQ and Qpid Dispatch Router for OpenStack over a WAN - Performance evaluation - Conclusions and next steps
Edge sites Challenges at the edge Local sites Regional sites DC1 ● Scalability ● Locality Core ● Placement Network ● Resiliency DC2 ● ...
OpenStack to the edge For a telco like Orange, pushing OpenStack to the edge is key ● How to deploy OpenStack in small edge sites ● (control plane + compute nodes)? Costly and too many control planes to manage and ○ synchronize ⇒ Have a centralized control plane (APIs) and remote ○ compute nodes OpenStack scalability (stateless processes) ● OpenStack over a WAN ●
Deployment under consideration Edge Deployment: Keystone Horizon - Centralized control services Nova (control) - Remote edge compute nodes Glance Communication between edge and core Neutron (control) Core - RPC traffic (periodic tasks, control traffic) - Rest API calls (e.g., between nova and glance) WAN Edge Nova (agent) Nova (agent) Nova (agent) Neutron (agent) Neutron (agent) Neutron (agent)
The message bus in OpenStack ● One critical component in OpenStack is the message bus used for interprocess communication Nova Nova Nova Neutron Neutron Nova API conductor scheduler compute server agent message bus message bus ● Used by processes to send various RPCs: ○ call : request from client to server, client waiting for response ○ cast : request from client to server, no response (direct notification) ○ fanout : request from client to multiple servers, no response (grouped notification) Server RPC Request RPC Client Server Client Server Client Server Response Server Call Cast Fanout
The message bus in OpenStack ● Processes use oslo.messaging library to send RPCs It supports multiple underlying messaging implementations Nova Nova Nova Neutron Neutron Nova API conductor scheduler compute server agent oslo.messaging oslo.messaging QPID QPID RabbitMQ RabbitMQ Dispatch Router Dispatch Router (AMQP 0.9.1) (AMQP 0.9.1) (AMQP 1.0) (AMQP 1.0) Client Client Server Server Router Broker topology cluster
The message bus over a WAN Broker Central site Server 1 cluster Regional Regional Server 2 site 1 site 2 Edge Edge Edge Edge Edge Edge site 1 site 2 site 3 site 4 site 5 site 6 Client 3 Server 3 Client 2 Server 4 Client 1
The message bus over a WAN Central site Server 1 Regional Regional Server 2 site 1 site 2 Edge Edge Edge Edge Edge Edge site 1 site 2 site 3 site 4 site 5 site 6 Client 3 Server 3 Client 2 Server 4 Client 1
Goal Evaluate the performance of RabbitMQ and Qpid Dispatch Router over a WAN How do they resist to WAN constraints (packet loss, latency, dropouts)? ○ Does a router fit better in a decentralized environment? ○ Are OpenStack operations still robust without a broker retaining ○ messages? Is a broker safer than a router? ○ How RPC communications (RabbitMQ and QDR) behave in a WAN? ○
What could go wrong in a WAN? Examples of two possible situations: RPC client RPC server RPC client RPC server latency/loss between client and bus latency/loss between server and bus (e.g., nova-conductor sends a (e.g., nova-compute sends a vm boot request to nova-compute) state update to nova-conductor)
What could go wrong in a WAN? In case of latency RPC client RPC server RPC calls: 1 - Sender blocks for 2× latency RPC casts (fire and forget semantics): - Correct semantic with QDR driver RPC client RPC server - Incorrect semantic with RabbitMQ driver 2 - In 1. Sender waits for 2x latency (acks) - But higher guarantee on message delivery
Experiments
Context ○ Test plan of massively distributed RPCs https://docs.openstack.org/performance-docs/latest/test_plans/ massively_distribute_rpc/plan.html ○ Two categories of experiments: 1. Synthetic (rabbitmq/qdr, decentralized configuration) 2. Operational (with OpenStack and centralized bus) ● Network dropout ● Latency and loss
Tools ○ EnOS for OpenStack deployment (virtualization, bare metal) https://github.com/BeyondTheClouds/enos ○ Grid’5000 a dedicated testbed for experiment-driven research
Synthetic experiment recap OpenStack Summit Vancouver 2018 presentation Evaluation of the implemented patterns of RPC messages in OpenStack ● Brokers and routers scalabilities are similar, but router is lightweight and ● achieves low latency message delivery especially under high load Routers offers locality of the messages in decentralized deployments ● Decentralization need to be applied to APIs and database ● Openstack internal messaging at the edge: I n depth evaluation www.openstack.org/summit/vancouver-2018/summit-schedule/events/2100 7/openstack-internal-messaging-at-the-edge-in-depth-evaluation
× 9→17 https://hal.inria.fr/hal-01891567 × 8→27 × 2
Operational experiments Keystone × 3 + 1 Nova (control) Glance Software Infrastructure Neutron (control) Bus (RMQ/QDR) OpenStack stable/queens - Hardware: Dell PowerEdge C6420 × 20 - Core node Optimised Kolla based deployment - 32 cores - - RabbitMQ version: v3.7.8 - 193 GB RAM - Qpid-Dispatch Router v1.3.0 - Virtualized deployment WAN - Core: 32 cores, 64 GB RAM - Edge: 2 cores, 4 GB RAM Nova (agent) Nova (agent) Nova (agent) × 100/400 Neutron (agent) Neutron (agent) Neutron (agent) Edge nodes
Network dropout Configuration Iptables on the core nodes ● (controller and network nodes) Cron to schedule dropouts ● Frequency: [5m, 10m] ○ Duration: [30s, 60s, 120s] ○ Rally ● runner ○ constant_for_duration Concurrency 5 ■ Duration 30m ■ OpenStack with 100 computes ● Full deployment for each combination (set of parameters, bus)
Network dropout: boot_and_delete_servers
Network dropout: boot_and_delete_servers
Latency and Loss Configuration Parameters ● Latency: [0, 5, 20, 40, 80, 120, 200] s ○ Loss: [0, 0.1, 0.2, 0.4, 0.8, 1.0, 2.0] % ○ Rally ● constant runner ○ Concurrency: 5 ■ Iterations: 100 ■ OpenStack ● Computes: [100, 400] ○ Full deployment for each combination (set of parameters, bus)
100 computes
400 computes
100 computes
Timeline behind the scene of rally benchmarks (multicast/400 computes) boot_server_and_attach_interface ● create_and_delete_network ● create_and_delete_port ● create_and_delete_router ● create_and_delete_security_groups ● create_and_delete_subnet ● set_and_clear_gateway ●
anycast queues
fanout queues
Conclusions
Summary In front of WAN latency and loss, the router (no message ● retention) is as effective at delivering messages as the broker (message retention) Router is less resilient in the case of network dropouts ● QDR consumes way less resources than RMQ ●
What’s next Bring QDR closer to edge sites and to compute nodes in ● order to leverage routing capabilities Bigger scale of compute nodes ● Make OpenStack control plane even more decentralized ● if possible (e.g., database)
ali.sanhaji@orange.com javier.rojas-balderrama@inria.fr matthieu.simonin@inria.fr https://beyondtheclouds.github.io
Recommend
More recommend