I did reboot lxc-01, and it reverted back to 3.16, and I ran a refresh again, and it’s back where it was.
What I really want to do is un-cluster these nodes, because this is REALLY REALLY unreliable.
Is there any way to manually hack on some database somewhere to tell each node that it’s a member of a single-node cluster? Then I can export their containers, and rebuild the machines.
It looks like this is simply unsolveable. It’s now possible for a cluster to be unable to coldboot, leaving me in the situation where I just had to throw it all away and redeploy it all without clustering.
I have turned the VMs off, so if @stgraber (or someone on his team?) wants to investigate how we managed to get into this situation, I can give you access.
But for the moment, I would strongly recommend people do not use LXD Clustering, in any form.
Responded privately to see how we could get access to take a look.
The error messages above suggest that the different nodes all can talk to each other and the database is functional.
The fact that it still shows Wait for other cluster nodes to upgrade their version suggests that either you have a 4th cluster node in the database which shouldn’t exist anymore (dead system which wasn’t removed from the cluster) or one of the node is failing to update its database record, holding the upgrade due to inconsistent database versions.