Last Snap Refresh has left my LXD cluster barely functioning again -"unix.socket: connect: connection refused|

Ok, so that confirms you lost that server on the 16th of November and didn’t remove it from the cluster causing the hang on refresh.

Do lxd sql global "DELETE FROM nodes WHERE id=5;" and things should unstick in the minute or so following.

Yep, that fixed it.

Perhaps servers that haven’t been seen in a while should go to PAUSED status… LOL
and be given a chance to reconnect later.

Thank you very much for your help, much appreciated.

Yeah, there is some tooling we’re working on around clustering to let you list the cluster state and kick out a dead node when you get stuck like this on upgrade.

Funny part, that node 59 was used a backup node, when we did our last backup,upgrade and reboot to insure we had a master node. And we shut it down fine. But we forgot it was still part of cluster.