Add the ability for LXD to advertise the subnets in use for its networks and instances to external routers over BGP.
When running production environments on LXD, especially on LXD clusters, it’s getting more and more common to want to directly route external addresses to specific instances or even to have an entire LXD network backend by external (non-NATed) addresses. The most common being a mixed setup where IPv4 uses a RFC1918 subnet but then directly routes external addresses on a as-needed basis while IPv6 connectivity directly uses an external subnet with additional (non-EUI64) addresses routed as needed.
Then with the recent addition of OVN support, it’s now possible to have non-admin cluster users self-created such networks, including having them get to use external addresses through LXD’s support for delegating subnets through projects.
All of that works great today but relies on the network or system administrator having put routing in place so that all of those external addresses and subnets get routed to the right LXD server or to the right OVN gateway. This manual step gets in the way of allowing on-demand creation of OVN networks by regular users and have them route any of the addresses or subnets that are allowed in their projects.
With the addition of native BGP support to LXD, the administrator will be able to setup BGP peers on the relevant external LXD networks and LXD will then take care of advertising all relevant routes and next hops directly to the routers.
The concept was proven and is already in use in production environments through an external tool: GitHub - stgraber/lxd-bgp: A tiny BGP server in Go exposing LXD external routes
With this change, LXD will become an optional BGP router, though in practice will only ever advertise routes and won’t do anything with any received prefixes.
At the global level, there will be a configuration option to configure a listen address and port for LXD’s built-in BGP server. Then at the network level, there will be configuration for the relevant peers to notify as new addresses and subnets get used in LXD.
To keep things simple initially, LXD will be announcing IPv4 and IPv6 prefixes over either protocol, requiring only a single session per peer (preferably IPv6 if dual-stack).
This is probably the simplest environment where LXD operates a traditional managed LXD bridge and where the IPv4 and/or IPv6 subnets are not NATed. LXD will advertise the relevant subnets to the router which will then route them to the correct host.
In this environment, we’ll need the uplink network for that OVN network to have BGP peers configured. When that’s the case, the IPv4 and/or IPv6 subnets will be advertised with a next-hop set to the OVN gateway.
In a cluster, all cluster members will be advertising the same route as OVN is distributed, so high availability of the route is expected.
With this case, the instances will likely be running on private addressing with LXD’s dnsmasq in charge of assigning addresses. LXD will look for
ipv6.routes.external to know what needs to be routed to the instance, will then setup a suitable route on the host (same as
ipv6.routes) and finally will advertise a route to the host over BGP.
In this setup, the instance’s OVN network may or may not be using external addressing itself, but the instance’s nic device has a
ipv6.routes.external config key set.
In such a setup, LXD will be advertising a route for the relevant external routes on the instance nic. In a cluster, only the host of the instance will be advertising the route as if the host becomes incapacitated, so is the instance.
LXD will begin advertising instance-specific routes shortly after completing the instance startup sequence and will withdraw the advertisement shortly prior to shutting down the instance.
For networks, advertisements will be kept active so long as the network exists.
As LXD will be the BGP router. Under normal circumstances, LXD exiting would cause all routes to be dropped by the upstream routers.
To prevent this, LXD will make use of BGP’s graceful restart feature, allowing a few minutes of downtime prior to the routes expiring when shutdown for refresh/update.
When a full shutdown is requested (
lxd shutdown or SIGPWR), LXD will instead withdraw all advertisements prior to shutdown.
No REST API changes are expected for this, however LXD will grow an additional listening port (typically tcp/179) when BGP is enabled and there will be a few additional configuration keys and tweaks to existing configuration.
New global configuration keys:
core.bgp_address(local, disabled by default, takes
core.bgp_asn(global, empty by default, takes the local ASN)
Network-specific configuration keys:
bgp.peers.<name>.asn(global, peer ASN)
bgp.peers.<name>.password(global, peer password, optional)
bgp.ipv4.nexthop(local, for bridged networks, override next hop)
bgp.ipv6.nexthop(local, for bridged networks, override next hop)
NIC-specific configuration keys:
ipv4.routes.external(now supported on bridged interfaces)
ipv6.routes.external(now supported on bridged interfaces)
No CLI changes, only affects config keys.
No schema change, only affects config keys.
No special handling needed as the feature did not previously exist.
Some constraints will get relaxed to allow the use of
ipv6.routes.external on regular bridges but won’t change behavior on upgrade.
This will also enable proper anycast setups through the use of ECMP routes.
ipv6.routes.anycast are configured, multiple instances on OVN networks will be able to advertise the exact same address or subnet.
This then results in multiple routes on the upstream router with equal weight. The L3 information then gets hashed and traffic gets balanced between the instances.