Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014
• There is no “one right way” • The goal is to move L3 resources to a new L2 resource as quickly and seamlessly as possible • This is a really difficult, but important, problem to solve
Layer 3 Internet Happens
Core Router L3 agent L3 agent L3 agent router1 router5 router4 router3 router2 router6 VM1 VM2 VM3 VM4 VM5 VM6 VM7
Core Router L3 agent L3 agent router1 router5 router4 router3 router2 router6 VM1 VM2 VM3 VM4 VM5 VM6 VM7
Layer 2 The ARPing is the hardest part
• One L3 resource may only be tied to one L2 resource at a time • Many technologies exist to sort of work around this • HSRP • VRRP • CARP • Work is being done to implement VRRP like functionality into Juno • https://blueprints.launchpad.net/neutron/+spec/l3- high-availability • Nothing is currently integrated into OpenStack
Pacemaker http://docs.openstack.org/high-availability-guide/ content/_highly_available_neutron_l3_agent.html
• False positives — caused more downtime than actual outages • Split brain possibilities • Assumes control of L3 agent start/stop functions • Limited Horizontal Scale • More difficult to run multiple Active L3 agents • Failover requires entire services starts/stops • Active/Passive Model Requires More Hardware • Works on a “per agent” level • Akin to RAID1
Core Router L3 agent L3 agent L3 agent router1 router5 router4 router3 router2 router6 VM1 VM2 VM3 VM4 VM5 VM6 VM7
Core Router L3 agent L3 agent router1 router5 router4 router3 router2 router6 VM1 VM2 VM3 VM4 VM5 VM6 VM7
Core Router L3 agent L3 agent L3 agent router1 router5 router4 router3 router2 router6 VM1 VM2 VM3 VM4 VM5 VM6 VM7
Neutron HA Tool https://raw.githubusercontent.com/stackforge/cookbook- openstack-network/master/files/default/neutron-ha-tool.py
• API Driven • Uses native API calls to perform all functions • Can be run externally from infrastructure or cross site • Supports any operations the neutron client libraries supports • Easily Extendable • Written in python • Leverages standard OpenStack libraries • Works on a “per resource” level
Core Router L3 agent L3 agent L3 agent router1 router5 router4 router3 router2 router6 VM1 VM2 VM3 VM4 VM5 VM6 VM7
Core Router L3 agent L3 agent router1 router5 router4 router3 router2 router6 VM1 VM2 VM3 VM4 VM5 VM6 VM7
Core Router L3 agent L3 agent router1 router5 router2 router6 router4 router3 VM1 VM2 VM3 VM4 VM5 VM6 VM7
• Only routers/IPs on the affected L3 agent are impacted • Recovery time depends on the number of routers which need to be migrated and the number of IPs on each router • Migration happens quickly, but every IP on the routers must re-ARP to the upstream switch • Meta-data proxies migrate with the routers
OK, so what’s the catch?
• Not seamless • The ARP processes happen in parallel, but generally take 60-90 seconds for all IPs to complete • Various *aaS offerings further complicate things • Currently only accounts for “l3-agent” controlled services • No coordination between HA tools • How do you HA the HA? • Currently not daemonized, runs from cron • Add 60 seconds to total recovery time • Jitter protection adds additional total recovery time • No mechanism by which to ensure resources actually come up/work
What about DHCP?
• Multiple DHCP agents may be run Active/Active • DHCP agents per subnet may be specified in your agent config file • Each agent requires an IP in the tenant’s subnet • DHCP is multi-cast • All agents have the same lease file • The first one to reply binds to the VM • Any DHCP agent may reply to a DNS request and resolve all known leases • By default, each DHCP agent hands out a list of every agent as available resolvers • HA tool has an option to replicate DHCP to all agents
Moving Forward • VRRP Like functionality • Specify number of Active L3 agents per subnet • Leverage conntrackd/keepalived • Point of diminishing returns for HA tool? • The beauty of open source: • There is no “one right way” • Think outside the box • Do cool things
Questions?
Recommend
More recommend