Project | LXD |
Status | Implemented |
Author(s) | @stgraber |
Approver(s) | @stgraber @tomp |
Release | 4.18 |
Internal ID | LX004 |
Abstract
Add the ability for LXD to advertise the subnets in use for its networks and instances to external routers over BGP.
Rationale
When running production environments on LXD, especially on LXD clusters, it’s getting more and more common to want to directly route external addresses to specific instances or even to have an entire LXD network backend by external (non-NATed) addresses. The most common being a mixed setup where IPv4 uses a RFC1918 subnet but then directly routes external addresses on a as-needed basis while IPv6 connectivity directly uses an external subnet with additional (non-EUI64) addresses routed as needed.
Then with the recent addition of OVN support, it’s now possible to have non-admin cluster users self-created such networks, including having them get to use external addresses through LXD’s support for delegating subnets through projects.
All of that works great today but relies on the network or system administrator having put routing in place so that all of those external addresses and subnets get routed to the right LXD server or to the right OVN gateway. This manual step gets in the way of allowing on-demand creation of OVN networks by regular users and have them route any of the addresses or subnets that are allowed in their projects.
With the addition of native BGP support to LXD, the administrator will be able to setup BGP peers on the relevant external LXD networks and LXD will then take care of advertising all relevant routes and next hops directly to the routers.
The concept was proven and is already in use in production environments through an external tool: GitHub - stgraber/lxd-bgp: A tiny BGP server in Go exposing LXD external routes
Specification
Design
With this change, LXD will become an optional BGP router, though in practice will only ever advertise routes and won’t do anything with any received prefixes.
At the global level, there will be a configuration option to configure a listen address and port for LXD’s built-in BGP server. Then at the network level, there will be configuration for the relevant peers to notify as new addresses and subnets get used in LXD.
To keep things simple initially, LXD will be announcing IPv4 and IPv6 prefixes over either protocol, requiring only a single session per peer (preferably IPv6 if dual-stack).
Scenarios
Bridged network with IPv4 and /or IPv6 subnet using external addresses
This is probably the simplest environment where LXD operates a traditional managed LXD bridge and where the IPv4 and/or IPv6 subnets are not NATed. LXD will advertise the relevant subnets to the router which will then route them to the correct host.
OVN network with IPv4 and/or IPv6 subnet using external addresses
In this environment, we’ll need the uplink network for that OVN network to have BGP peers configured. When that’s the case, the IPv4 and/or IPv6 subnets will be advertised with a next-hop set to the OVN gateway.
In a cluster, all cluster members will be advertising the same route as OVN is distributed, so high availability of the route is expected.
External addresses/subnets routed to a specific instance on a bridged network
With this case, the instances will likely be running on private addressing with LXD’s dnsmasq in charge of assigning addresses. LXD will look for ipv4.routes.external
and/or ipv6.routes.external
to know what needs to be routed to the instance, will then setup a suitable route on the host (same as ipv4.routes
or ipv6.routes
) and finally will advertise a route to the host over BGP.
External addresses/subnets routed to a specific instance on a OVN network
In this setup, the instance’s OVN network may or may not be using external addressing itself, but the instance’s nic device has a ipv4.routes.external
or ipv6.routes.external
config key set.
In such a setup, LXD will be advertising a route for the relevant external routes on the instance nic. In a cluster, only the host of the instance will be advertising the route as if the host becomes incapacitated, so is the instance.
Integration with instance lifecycle
LXD will begin advertising instance-specific routes shortly after completing the instance startup sequence and will withdraw the advertisement shortly prior to shutting down the instance.
For networks, advertisements will be kept active so long as the network exists.
Behavior on LXD restart/update
As LXD will be the BGP router. Under normal circumstances, LXD exiting would cause all routes to be dropped by the upstream routers.
To prevent this, LXD will make use of BGP’s graceful restart feature, allowing a few minutes of downtime prior to the routes expiring when shutdown for refresh/update.
When a full shutdown is requested (lxd shutdown
or SIGPWR), LXD will instead withdraw all advertisements prior to shutdown.
API changes
No REST API changes are expected for this, however LXD will grow an additional listening port (typically tcp/179) when BGP is enabled and there will be a few additional configuration keys and tweaks to existing configuration.
New global configuration keys:
core.bgp_address
(local, disabled by default, takes<ip>:<port>
)core.bgp_asn
(global, empty by default, takes the local ASN)
Network-specific configuration keys:
bgp.peers.<name>.address
(global, peer<ip>:<port>
)bgp.peers.<name>.asn
(global, peer ASN)bgp.peers.<name>.password
(global, peer password, optional)bgp.ipv4.nexthop
(local, for bridged networks, override next hop)bgp.ipv6.nexthop
(local, for bridged networks, override next hop)
NIC-specific configuration keys:
ipv4.routes.external
(now supported on bridged interfaces)ipv6.routes.external
(now supported on bridged interfaces)
CLI changes
No CLI changes, only affects config keys.
Database changes
No schema change, only affects config keys.
Upgrade handling
No special handling needed as the feature did not previously exist.
Some constraints will get relaxed to allow the use of ipv4.routes.external
and ipv6.routes.external
on regular bridges but won’t change behavior on upgrade.
Further information
Prototype: GitHub - stgraber/lxd-bgp: A tiny BGP server in Go exposing LXD external routes
This will also enable proper anycast setups through the use of ECMP routes.
When ipv4.routes.anycast
and/or ipv6.routes.anycast
are configured, multiple instances on OVN networks will be able to advertise the exact same address or subnet.
This then results in multiple routes on the upstream router with equal weight. The L3 information then gets hashed and traffic gets balanced between the instances.