neutron l3 agent ha
play

Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the - PowerPoint PPT Presentation

Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 There is no one right way The goal is to move L3 resources to a new L2 resource as quickly and seamlessly as


  1. Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014

  2. • There is no “one right way” • The goal is to move L3 resources to a new L2 resource as quickly and seamlessly as possible • This is a really difficult, but important, problem to solve

  3. Layer 3 Internet Happens

  4. Core Router L3 agent L3 agent L3 agent router1 router5 router4 router3 router2 router6 VM1 VM2 VM3 VM4 VM5 VM6 VM7

  5. Core Router L3 agent L3 agent router1 router5 router4 router3 router2 router6 VM1 VM2 VM3 VM4 VM5 VM6 VM7

  6. Layer 2 The ARPing is the hardest part

  7. • One L3 resource may only be tied to one L2 resource at a time • Many technologies exist to sort of work around this • HSRP • VRRP • CARP • Work is being done to implement VRRP like functionality into Juno • https://blueprints.launchpad.net/neutron/+spec/l3- high-availability • Nothing is currently integrated into OpenStack

  8. Pacemaker http://docs.openstack.org/high-availability-guide/ content/_highly_available_neutron_l3_agent.html

  9. • False positives — caused more downtime than actual outages • Split brain possibilities • Assumes control of L3 agent start/stop functions • Limited Horizontal Scale • More difficult to run multiple Active L3 agents • Failover requires entire services starts/stops • Active/Passive Model Requires More Hardware • Works on a “per agent” level • Akin to RAID1

  10. Core Router L3 agent L3 agent L3 agent router1 router5 router4 router3 router2 router6 VM1 VM2 VM3 VM4 VM5 VM6 VM7

  11. Core Router L3 agent L3 agent router1 router5 router4 router3 router2 router6 VM1 VM2 VM3 VM4 VM5 VM6 VM7

  12. Core Router L3 agent L3 agent L3 agent router1 router5 router4 router3 router2 router6 VM1 VM2 VM3 VM4 VM5 VM6 VM7

  13. Neutron HA Tool https://raw.githubusercontent.com/stackforge/cookbook- openstack-network/master/files/default/neutron-ha-tool.py

  14. • API Driven • Uses native API calls to perform all functions • Can be run externally from infrastructure or cross site • Supports any operations the neutron client libraries supports • Easily Extendable • Written in python • Leverages standard OpenStack libraries • Works on a “per resource” level

  15. Core Router L3 agent L3 agent L3 agent router1 router5 router4 router3 router2 router6 VM1 VM2 VM3 VM4 VM5 VM6 VM7

  16. Core Router L3 agent L3 agent router1 router5 router4 router3 router2 router6 VM1 VM2 VM3 VM4 VM5 VM6 VM7

  17. Core Router L3 agent L3 agent router1 router5 router2 router6 router4 router3 VM1 VM2 VM3 VM4 VM5 VM6 VM7

  18. • Only routers/IPs on the affected L3 agent are impacted • Recovery time depends on the number of routers which need to be migrated and the number of IPs on each router • Migration happens quickly, but every IP on the routers must re-ARP to the upstream switch • Meta-data proxies migrate with the routers

  19. OK, so what’s the catch?

  20. • Not seamless • The ARP processes happen in parallel, but generally take 60-90 seconds for all IPs to complete • Various *aaS offerings further complicate things • Currently only accounts for “l3-agent” controlled services • No coordination between HA tools • How do you HA the HA? • Currently not daemonized, runs from cron • Add 60 seconds to total recovery time • Jitter protection adds additional total recovery time • No mechanism by which to ensure resources actually come up/work

  21. What about DHCP?

  22. • Multiple DHCP agents may be run Active/Active • DHCP agents per subnet may be specified in your agent config file • Each agent requires an IP in the tenant’s subnet • DHCP is multi-cast • All agents have the same lease file • The first one to reply binds to the VM • Any DHCP agent may reply to a DNS request and resolve all known leases • By default, each DHCP agent hands out a list of every agent as available resolvers • HA tool has an option to replicate DHCP to all agents

  23. Moving Forward • VRRP Like functionality • Specify number of Active L3 agents per subnet • Leverage conntrackd/keepalived • Point of diminishing returns for HA tool? • The beauty of open source: • There is no “one right way” • Think outside the box • Do cool things

  24. Questions?

Recommend


More recommend