Talking to a remote cluster - how to avoid having a SPOF

laney · January 4, 2021, 3:08pm

I’m working on moving Ubuntu’s autopkgtest installation to using clustering for its armhf nodes, rather than manually managing remotes. The current setup looks like this:

A service running in a “production” cloud runs a number of instances of a program to receive requests and start lxd instances to run the requests. Instances are configured to send to a particular remote. Remotes are running in a different “workload” cloud. When a remote breaks, the admins manually delete it and provision a new one, then update the controller to know about this one rather than the old.

In the new world, instead of having the controller know about each backend instance, it would know about an LXD cluster - as a remote - and recovering from failure of a node is a matter of deleting the old one and running a script to deploy a new one to the cluster.

This feels nice, except for the “know about an LXD cluster” part. That’s a single point of failure: I will have to tell the system the address of one of the nodes, and if this one goes down then I need to reconfigure the remote to point to a different IP.

I suppose there are a couple of things I could do to mitigate this:

Use a DNS name, and round robin amongst all the nodes (with health checking?)
Set up haproxy and talk to this instead (would this work?)

but they are both kind of annoying, since I have to write glue and synchronise this external thing with the active list of nodes. So the question is whether this is something LXD could support directly. That is - when a remote is part of a cluster, regularly communicate the other addresses in the cluster and then fail over to another if we find that the one we’re talking to is dead now.

stgraber · January 4, 2021, 4:47pm

DNS should do what you want just fine, just put all servers in there, ideally configure your DNS server to randomly order records and the client will just try the next one if it can’t connect.

laney · January 4, 2021, 4:55pm

I get it, but it’s another thing to maintain and keep in sync, somehow supply to the system holding the remote, I’m suggesting LXD could help out there since it already has this state.

I know I’ll have to do this now, but I guess I’m making a feature request to obsolete that eventually

stgraber · January 4, 2021, 5:03pm

It’s unfortunately much harder than it sounds

LXD does know an address for each cluster member, but those are internal addresses, they don’t have to be publicly reachable by the clients (cluster.https_address). The clients connect over a separate address (core.https_address) which is often set to the default wildcard (:8443), at which point you do not have a single address per member, instead you have multiple addresses per interface.

We did consider having the client do that kind of thing in the past but decided against it as similar logic just for single nodes in instance migration has proved very confusing in the past (we iterate through all addresses on the source but we’ve had cases where source and target have the same address on alternate interfaces causing very confusing issues).

For the most simple case where each node has an identical, publicly reachable core.https_address and cluster.https_address, what you suggest is possible, though wouldn’t work for restricted users (RBAC) which do not get to see system configuration like that, but we don’t really like solutions that only work for a itme until you change a server side address setting and all your clients break

We’re starting to add cluster member specific configs so it could be that the answer will change in a few months where we’d have infrastructure to possibly indicate a preferred client address per member and could then have that be exposed both to CLI and to tools managing DNS/HAproxy.

laney · January 4, 2021, 5:33pm

Cheers. That’s useful information!

I guess it is a bit tough given the fact that the nodes might be in a changing network environment with the bind-to-all-addresses behaviour; this state would have to be communicated to all of the peers and the client would have to get to know about it too to be able to failover/round robin correctly. I want it to be magic: the client knows core.https_address (or multiple of these) that it is allowed to use for each cluster member, this is always up to date as members come and go or the network environment changes. But ok, received, it’s harder than I wish it was.

I guess my request then is that this feature request is received and considered. Some of the properties of clustering a really great and just what we want here, but having to self mitigate a SPOF is a bit of a drag.